FOODS4ALL HOME
Last Update: 2020-05-12

1. swirl について

An interactive learning environment for R and statistics.

swirl は、R の学習用に開発された R の Package です。詳しくは、{swirl} website をご覧ください。swirl は簡単に使えますが、英語で学ぶことに抵抗があるかたのために、Console Log と呼ばれる、画面に表示されるものの日本語訳と、多少の解説を書きます。従って、練習問題の答えも含みます。しかし、練習問題を実際に考えて自分で試してみることが重要ですから、このファイルをざっと見て、わかった気になることはお薦めしません。

swirl のメッセージの訳の概要は、斜体 で書きます。

1.1 swirl のはじめかた

以下では、RStudio の Console を使うことを想定しています。RStudio でなくても、swirl を使うことは可能ですが、{swirl} のサイトでも推奨していますから、RStudio を使いましょう。

  • Internet 上の RStudio.cloud でも、Computer 上に Install した RStudio でも問題ありません。
  • Computer に RStudio を Installして使うときには、最初に、R を Install することも必要です。
  • RStudio.cloud の場合は、Project を作成して始めてください。以前に作った、Project でもかまいません。

1.2 swirl 起動まで

  1. swirl を一度も使ったことのない方は、Rswirl を Install する必要があります。よくわからない場合は、RStudio 右下の Pane(窓枠)の Package タブを選択し、チェックすることができますが、重ねて、Install しても問題はありません。または、Console(左下の Pane)に library(swirl) と入力し、エラーメッセージが出るかを確認する方法も可能です。(Install の次のステップはこの Command ですから)

  2. 上の RStudio の Menu Bar の Tools から、Install Packages を選択し、swirl と入れて、Install です。Console に install.packages("swirl") と直接入れることも可能です。いずれにしても、この Command が実行され(Console に現れます)Install されます。

  3. swirl Package を使えるようにするために、Console に
> library(swirl)

と入れ、Enter します。(すでに Install されていれば、このステップからはじめます。)ここまでで準備 OK。次のような表示がでると思います。

| Hi! Type swirl() when you are ready to begin.

こんにちは。準備ができたら、swirl()とタイプしてください。

  1. いよいよ起動です。Console に
> swirl()

といれ、Enter。これで起動します。

はじめる前に、RStudio の右下の Pane の Help タブを選択し、その虫眼鏡マークの窓に、swirl と入れると、基本事項が表示されます。この内容は、以下の学習の中でも説明されますし、{swirl} のサイト でも説明されています。

1.3 起動から、動作の説明、コース選択まで

Console には > のマークが出ているときは、Command 入力待ちの状態です。この記号を、Prompt と言います。以下、下のように、> swirl と書いてあるときに、皆さんが入力する部分は、swirl の部分だけです。

> swirl()

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you did then.
| If you are new, call yourself something unique.

What shall I call you? 

swirl にようこそ。名前を登録して始めましょう。以前に登録した場合には、同じ名前を使ってください。はじめてのかたは、あなたのことをなんと呼べば良いか、わかりやすい名前を入力してください。

何とお呼びしますか?

FOODS4ALL 管理者の Nickname は、SL ですから、わたしは、SL と入力します。 (この名前で、学習記録をしているようです。どんな名前でもかまいません。)

What shall I call you? SL
| Thanks, SL. Let's cover a couple of quick housekeeping items before we begin our first lesson.
| First of all, you should know that when you see '...', that means you should press Enter when
| you are done reading and ready to continue.

...  <-- That's your cue to press Enter to continue

ありがとう SL(登録した名前)。はじめる前に、基本的なことをいくつか確認しましょう。
まず、「… 」とあったら、その上にある説明を読み、Enter を押してください。つまりは、「次へ」というしるしです。
… <– これが「次へ」のしるしですから、ここまで読んだら Enter です。

| Also, when you see 'ANSWER:', the R prompt (>), or when you are asked to select from a list,
| that means it's your turn to enter a response, then press Enter to continue.

Select 1, 2, or 3 and press Enter 

1: Continue.
2: Proceed.
3: Let's get going!

Selection: 1

‘ANSWER’ や、prompt の >、または、Select: と選択肢が示されたら、あなたが答える番です。入力をし、Enter を押してください。
1: 続ける.
2: 先へ進みます.
3: 次に行きましょう.

選択肢:1(SL が入力)

上の Selection: のあとの 1 は、SLが入力したものです。今後も、SL の入力を含めて表示します。標準的な入力だと考えてください。ただし、今回の入力は、1, 2, 3 どれを選んでも同じです。

| You can exit swirl and return to the R prompt (>) at any time by pressing the Esc key. If you
| are already at the prompt, type bye() to exit and save your progress. When you exit properly,
| you'll see a short message letting you know you've done so.

| When you are at the R prompt (>):
| -- Typing skip() allows you to skip the current question.
| -- Typing play() lets you experiment with R on your own; swirl will ignore what you do...
| -- UNTIL you type nxt() which will regain swirl's attention.
| -- Typing bye() causes swirl to exit. Your progress will be saved.
| -- Typing main() returns you to swirl's main menu.
| -- Typing info() displays these options again.

| Let's get started!

...

Esc キーを押せば、いつでも、swirl を終了して、R prompt (>) に戻ることができます。もし、R prompt (>) が出ているときなら、bye() と入力してください。swirlを終了し、そこまでの部分を保存して終了することができます。終了し、保存したことを示す、短い、メッセージが出ます。

R prompt (>) が出ているときには次のことをすることができます。
skip() と入力することで、現在の質問をスキップすることができます。
play() と入力することで、R 上での操作を試してみることができます。このときは、swirl は、次に nxt() とあなたが入力するまで、あなたが入力することを無視します。
bye() と入力すると、それまでの部分を保存し、swirl を終了します。
main() と入力すると、swirl のメインメニューに戻ります。
info() と入力すると、これらの option を表示します。

| To begin, you must install a course. I can install a course for you from the internet, or I can
| send you to a web page (https://github.com/swirldev/swirl_courses) which will provide course
| options and directions for installing courses yourself. (If you are not connected to the
| internet, type 0 to exit.)

1: R Programming: The basics of programming in R
2: Regression Models: The basics of regression modeling in R
3: Statistical Inference: The basics of statistical inference in R
4: Exploratory Data Analysis: The basics of exploring data in R
5: Don't install anything for me. I'll do it myself.

Selection: 1

はじめるには、コースを Install する必要があります。Internet から、自動的に、Install することも可能ですが、web page (https://github.com/swirldev/swirl_courses) から 探すことも可能です。コースについての情報や指示を得る事ができます。(もし、Internet に接続していない場合には、0 を入力して終了してください。)

1: R によるプログラム:R によるプログラムの基本
2: 回帰モデル:R による回帰モデルの基本
3: 統計推量:R による統計推量の基本
4: 探索的データ分析:R による探索の基本
5: 自分で Install しますから、何も、Install しないでください。

選択:1(SL は 1 を選択)

基本の、「R によるプログラム:R によるプログラムの基本」からはじめます。すでに、R の基礎をご存じの方は、他のコースも試してくださっても結構です。もう少し進んで、1-15 のメニューを見てから決めてもよいと思います。

|=========================================================================================| 100%

| Course installed successfully!


| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

コースは無事 Install されました。

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: swirl コースのリストへ.

選択:1 (SL が入力)

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files        3: Sequences of Numbers    
 4: Vectors                    5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                      9: Functions               
10: lapply and sapply         11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times           15: Base Graphics           


Selection: 1

どの課(クラス)からはじめるか選択してください。0 を入力すると、コースのメニューに戻ります。

1: 基本的構成要素 2: 作業スペースとファイル 3: 数列
4: ベクトル 5: 欠損値 6: ベクトルの部分抽出
7: 行列と、データフレーム 8: 論理 9: 関数
10: lapplysapply 11: vapplytapply 12: データの構造を見る
13: シミュレーション 14: 日付と時間 15: グラフィックの基本

選択:1 (SL が入力)

1-15 について、十分理解している場合は、0 を入力して他のコースを試してください。ここでは、1 をはじめますが、1 は不要というかたは、他のものからはじめてください。

個人的には、よいプログラムだと思いますが、2: Workspace and Files は、Unix file system を多少知っていれば別ですが、すこし、難しく感じるかもしれません。9, 10, 11 と、もしかすると 13 も、プログラムの経験のない人には、多少、難しく感じるかもしれません。一応、個人的な、お薦めの順番は、以下の通りです。

1, 3, 4, 5, 6, 7, 12, 15, 14, 8, 9, 10, 11, 2

順番にまなび、わからないときは、躊躇せずに、skip() でその問題を飛ばし、答えを確認し、あとからもどってきて、もう一度、挑戦するのも一つだと思います。自由に、自分にあった、方法を見つけてください。他にも、初級の swirl がありますので、このファイルの後ろの方を参考にしてください。

  |                                                                                         |   0%

| In this lesson, we will explore some basic building blocks of the R programming language.

...

0% (このあとも、進捗状況を確認できます。)

この課(クラス)では、R によるプログラムのための基本的な構成要素について学びます。

…(このマークが出ましたから、良ければ Enter を押してください。今後は、このマークについての説明は省略します。)

  |==                                                                                       |   3%
| If at any point you'd like more information on a particular topic related to R, you can type
| help.start() at the prompt, which will open a menu of resources (either within RStudio or your
| default web browser, depending on your setup). Alternatively, a simple web search often yields
| the answer you're looking for.

...

3% (このあとは、この部分の説明は省略します。)

どこかの時点で、R について情報が必要なときは、Prompt (>) で、help.start() と、入力してください。設定により、RStudio または、web browser が開き、Help が使えます。自分で、web 上で情報を検索するのもよいでしょう。

ここでまで、準備完了。Basic Building Blocks(基本的構成要素)に入ります。

注:Beginner 用の R Programming の各 Lesson の最後に、‘Would you like to receive credit for completing this course on Coursera.org?’ とメッセージが表示されますが、2: No を選択してください。(swirl Course Repository 参照)

備考: 1. 最初に書いたように、Console Log の日本語訳を含みますから、練習問題の答えも含まれています。ただ、ここに書かれているものだけが、答えではありませんし、答えをみてもよく理解できないこともあると思います。そのときは、“swirl R Programming Console Log” と検索してみてください。いくつかのサイトがみつかり、答えと共に、質問やその応答もあります。個人的に、そのようなサイトをチェックしているわけではないので、保証はできませんが、自分がわかりやすい道を探していただければ幸いです。 2. 基本的に、swirl は終了するとそこまでの分を親切に保存してくれます。はじめるときに、たとえば、前回、Functions の途中で終わったとすると、下のように表示されますから、もし、Functions の最初、または、他の課からはじめたいときは、2 を選択してください。

| Would you like to continue with one of these lessons?

1: R Programming Functions
2: No. Let me start something new.

Selection: 

2. R Programming

2.1. Basic Building Blocks

  |=====                                                                                    |   5%
| In its simplest form, R can be used as an interactive calculator. Type 5 + 7 and press Enter.

> 

R の最も簡単な使い方は、電卓としてです。5 + 7 と入力して Enter を押してください。

入力と、計算結果は以下の通りとなります。+ の両側に space は入れていませんが、問題ありません。

5+7
## [1] 12
| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |=======                                                                                  |   8%
| R simply prints the result of 12 by default. However, R is a programming language and often the
| reason we use a programming language as opposed to a calculator is to automate some process or
| avoid unnecessary repetition.

...

R は、単純に、12 を print(出力)します。しかし、R はプログラム言語ですから、電卓とは、異なるのは、プロセスを自動化し、不必要な反復作業を避けることです。(… とありますね。)

  |=========                                                                                |  11%
| In this case, we may want to use our result from above in a second calculation. Instead of
| retyping 5 + 7 every time we need it, we can just create a new variable that stores the result.

...

この場合では、結果をもう一度使うことが可能です。もう一度、5+7 と入力する代わりに、結果を、新たな変数に保存することができます。 (… とありますね。そろそろ注意喚起も省略します。)

  |============                                                                             |  13%
| The way you assign a value to a variable in R is by using the assignment operator, which is just
| a 'less than' symbol followed by a 'minus' sign. It looks like this: <-

...

R である変数に値を割り当てるには、そのための記号を使います。「小なり」の記号につづけて「マイナス」と書きます。<- こんな感じです。

  |==============                                                                           |  16%
| Think of the assignment operator as an arrow. You are assigning the value on the right side of
| the arrow to the variable name on the left side of the arrow.

...

矢印のようなものと思ってください。矢印の右側のものを、左側の変数に割り当てます。

|================                                                                         |  18%
| To assign the result of 5 + 7 to a new variable called x, you type x <- 5 + 7. This can be read
| as 'x gets 5 plus 7'. Give it a try now.

5+7 の計算結果を、新しい変数 x に割り当てます。このときは、x <- 5+7 とします。「5 足す 7 を x に割り当てる(と置く)」と読みます。やってみてください。

x <- 5+7
| You are quite good my friend!

いい具合ですね。

  |===================                                                                      |  21%
| You'll notice that R did not print the result of 12 this time. When you use the assignment
| operator, R assumes that you don't want to see the result immediately, but rather that you
| intend to use the result for something else later on.

...

R は、今度は、結果の 12 を出力しません。R は、今は、結果を見る必要はないが、いずれ、結果を使いたいのだと解釈します。

|=====================                                                                    |  24%
| To view the contents of the variable x, just type x and press Enter. Try it now.

変数 x の中身をみるときは、単純に、x と入力し Enter とします。やってみてください。

x
## [1] 12
| Nice work!

それで良いですよ。

  |=======================                                                                  |  26%
| Now, store the result of x - 3 in a new variable called y.

今度は、x-3 の結果を、新しい変数 y に割り当ててください。

y <- x-3
| You got it!

その通りです。

  |==========================                                                               |  29%
| What is the value of y? Type y to find out.

y の値は何ですか。y と入力して、y の値を求めてください。

y
## [1] 9
| You're the best!

最高です。

  |============================                                                             |  32%
| Now, let's create a small collection of numbers called a vector. Any object that contains data
| is called a data structure and numeric vectors are the simplest type of data structure in R. In
| fact, even a single number is considered a vector of length one.

...

次に、いくつかの数を並べたベクトルを作りましょう。データを含むオブジェクト(もの、今後は、Object と呼びます。)は何であっても R ではデータ構造と呼びます。数ベクトルは、Rの最も単純なデータ構造です。一つの数からなるものも、長さ1のベクトルと見なします。

  |==============================                                                           |  34%
| The easiest way to create a vector is with the c() function, which stands for 'concatenate' or
| 'combine'. To create a vector containing the numbers 1.1, 9, and 3.14, type c(1.1, 9, 3.14). Try
| it now and store the result in a variable called z.

ベクトルを作る一番簡単な方法は、c() 関数を使うことです。c は、concatenate(連結する)とか、combine(結合する)からとったものです。1.1, 9, 3.13 を含むベクトルを作るには、c(1.1, 9, 3.14) と入力します。今度は、結果を z という変数に割り当ててください。

z <- c(1.1,9,3.14)
| You're the best!

最高です。

  |=================================                                                        |  37%
| Anytime you have questions about a particular function, you can access R's built-in help files
| via the `?` command. For example, if you want more information on the c() function, type ?c
| without the parentheses that normally follow a function name. Give it a try.

いつでも、特定の関数について、疑問があれば、? command (命令) を使ってR に付属の Help file を見ることができます。たとえば、c() について知りたいときは、?c と() なしで command 名を入力します。やってみてください。

?c
| That's the answer I was looking for.

そうです。それが、期待したものです。

RStudio の右下の Pane の Help タブに、c {base}、Combine Values into a Vector or List のタイトルで、説明が現れると思います。直接、この Help タブの、虫眼鏡マークの枠に、c と入れても同じ説明が表示されます。

  |===================================                                                      |  39%
| Type z to view its contents. Notice that there are no commas separating the values in the
| output.

z と入力し、z の中身を表示させてください。値のしきりとしてのカンマは表示されません。

z
## [1] 1.10 9.00 3.14
| You got it right!

その通りです。

  |=====================================                                                    |  42%
| You can combine vectors to make a new vector. Create a new vector that contains z, 555, then z
| again in that order. Don't assign this vector to a new variable, so that we can just see the
| result immediately.

ベクトルを組み合わせて新しいベクトルを作ることができます。z, 555, そしてその後にまた、z とこの順で含むベクトルを作ってください。新しい変数に割り当てないでください。すると、その結果を、すぐ見ることができます。

c(z,555,z)
## [1]   1.10   9.00   3.14 555.00   1.10   9.00   3.14
| You're the best!

その調子です。

  |========================================                                                 |  45%
| Numeric vectors can be used in arithmetic expressions. Type the following to see what happens: z
| * 2 + 100.

_数ベクトルを式の中で使うことができます。z*2+100 と入力してください。どうなりますか。_

z*2+100
## [1] 102.20 118.00 106.28
| Excellent work!

素晴らしい。

  |==========================================                                               |  47%
| First, R multiplied each of the three elements in z by 2. Then it added 100 to each element to
| get the result you see above.

...

R ではまず、z の三つの数すべてに2をかけ、さらにそれぞれに、100 を加え、上でみた答えが得られます。

  |============================================                                             |  50%
| Other common arithmetic operators are `+`, `-`, `/`, and `^` (where x^2 means 'x squared'). To
| take the square root, use the sqrt() function and to take the absolute value, use the abs()
| function.

...

他の良く使う演算子には、+, -, /, や ^ (x^2 は ’x の2乗’を表します) があります。平方根をとるには、sqrt() 関数を使い、絶対値をとるには、abs() を使います。

  |===============================================                                          |  53%
| Take the square root of z - 1 and assign it to a new variable called my_sqrt.

z-1 の平方根 (square root) を、my_sqrt という名前の新しい変数に割り当ててください。

my_sqrt <- sqrt(z-1)
| That's a job well done!

よくできました。

  |=================================================                                        |  55%
| Before we view the contents of the my_sqrt variable, what do you think it contains?

1: a vector of length 3
2: a vector of length 0 (i.e. an empty vector)
3: a single number (i.e a vector of length 1)

Selection: 1

変数 my_sqrt の中身は、どのようになっていると思いますか。

1: 長さ3のベクトル
2: 長さ0のベクトル(すなわち空ベクトル)
3: 一つの数(長さ1のベクトル)

選択:1 (SL が入力)

| You're the best!

最高です。

  |====================================================                                     |  58%
| Print the contents of my_sqrt.

my_sqrt の中身を表示させてください。

my_sqrt
## [1] 0.3162278 2.8284271 1.4628739
| All that practice is paying off!

練習のおかげですね。

  |======================================================                                   |  61%
| As you may have guessed, R first subtracted 1 from each element of z, then took the square root
| of each element. This leaves you with a vector of the same length as the original vector z.

...

あなたが予想したように、Rは、まず、z の各成分から、1 を引き、そのあとで、それぞれの平方根をとっています。それによって、最初のベクトル z と同じ長さの、ベクトルになりました。

  |========================================================                                 |  63%
| Now, create a new variable called my_div that gets the value of z divided by my_sqrt.

今度は、z を my_sqrt で割り、それを my_div という名前の新しい変数に割り当ててください。

my_div <- z/my_sqrt
| All that hard work is paying off!

努力は報われます。

  |===========================================================                              |  66%
| Which statement do you think is true?

1: The first element of my_div is equal to the first element of z divided by the first element of my_sqrt, and so on...
2: my_div is a single number (i.e a vector of length 1)
3: my_div is undefined

Selection: 1

次のどれが正しいと思いますか。

1: my_div の最初の成分は、z の最初の成分を、my_sqrt の最初の成分で割ったもので、そのあとも、似たように、
2: my_div は一つの数です(すなわち長さ1のベクトル)
3: my_div は定義されない

選択:1(SL が入力)

| Keep up the great work!

その調子です。

  |=============================================================                            |  68%
| Go ahead and print the contents of my_div.

では、my_div の内容を表示させてください。

my_div
## [1] 3.478505 3.181981 2.146460
| You're the best!

最高です。

  |===============================================================                          |  71%
| When given two vectors of the same length, R simply performs the specified arithmetic operation
| (`+`, `-`, `*`, etc.) element-by-element. If the vectors are of different lengths, R 'recycles'
| the shorter vector until it is the same length as the longer vector.

...

長さが同じ二つのベクトルのときには、Rは単純に各成分ごとに通常の演算 (+, -, *, 等) を行います。長さが異なるときには、R は短い方のベクトルの成分を長い方のベクトルの長さになるまで再利用します。

  |==================================================================                       |  74%
| When we did z * 2 + 100 in our earlier example, z was a vector of length 3, but technically 2
| and 100 are each vectors of length 1.

...

以前、z * 2 + 100 を計算したときには、z は長さ 3 のベクトルでした。2 と 100 は、長さ1のベクトルと考えて、再利用したと考えることができます。

  |====================================================================                     |  76%
| Behind the scenes, R is 'recycling' the 2 to make a vector of 2s and the 100 to make a vector of
| 100s. In other words, when you ask R to compute z * 2 + 100, what it really computes is this: z
| * c(2, 2, 2) + c(100, 100, 100).

...

すなわち、背後で、R は、再利用を行い、2 を 2 が並んだベクトルとし、100 を 100が並んだベクトルとして扱っています。他の言葉で表現すると、z * 2 + 100 を計算するとき、実際には、z * c(2, 2, 2) + c(100, 100, 100) を計算しているのです。

  |======================================================================                   |  79%
| To see another example of how this vector 'recycling' works, try adding c(1, 2, 3, 4) and c(0,
| 10). Don't worry about saving the result in a new variable.

他の例で、ベクトルの再利用がどのように行われるか見てみましょう。c(1, 2, 3, 4) と c(0, 10) を加えてみてください。結果を、新しい変数に割り当てる必要はありません。

c(1,2,3,4) + c(0,10)
## [1]  1 12  3 14
| Nice work!

それで良いですよ。

  |=========================================================================                |  82%
| If the length of the shorter vector does not divide evenly into the length of the longer vector,
| R will still apply the 'recycling' method, but will throw a warning to let you know something
| fishy might be going on.

...

短いベクトルの長さはは、長いベクトルの長さを割り切りることができない場合も R は再利用を実行しますが、警告(warning)を発し、何か気持ち悪いことが起こっていると伝えます。

  |===========================================================================              |  84%
| Try c(1, 2, 3, 4) + c(0, 10, 100) for an example.

_例として、c(1, 2, 3, 4) + c(0, 10, 100) を試してください。

c(1, 2, 3, 4) + c(0, 10, 100)
## Warning in c(1, 2, 3, 4) + c(0, 10, 100): 長いオブジェクトの長さが短いオブジェク
## トの長さの倍数になっていません
## [1]   1  12 103   4
| All that practice is paying off!

練習のおかげですね。

``` |============================================================================= | 87% | Before concluding this lesson, I’d like to show you a couple of time-saving tricks.

_この課を終了する前に、いくつかの、効率的に作業を進める(省エネ)技術をお教えします。_

|================================================================================ | 89% | Earlier in the lesson, you computed z * 2 + 100. Let’s pretend that you made a mistake and that | you meant to add 1000 instead of 100. You could either re-type the expression, or…

_この課の前の方で、z * 2 + 100 の計算をしました。本当は、100 ではなく、1000 だったとしましょう。もう一度、入力することも、可能ですが ..._

|================================================================================== | 92% | In many programming environments, the up arrow will cycle through previous commands. Try hitting | the up arrow on your keyboard until you get to this command (z * 2 + 100), then change 100 to | 1000 and hit Enter. If the up arrow doesn’t work for you, just type the corrected command. ``` プログラムの環境では、上矢印で、一つ前の命令を出すことができるようになっている場合が多くあります。キーボードの上矢印を何回か押して、z * 2 + 100 を表示させ、100 を 1000 に変更して、Enter を押してください。上矢印がうまく作動しなければ、適切なキーを見つけてください。

z*2+1000
## [1] 1002.20 1018.00 1006.28
| Your dedication is inspiring!

感動ものです。

  |====================================================================================     |  95%
| Finally, let's pretend you'd like to view the contents of a variable that you created earlier,
| but you can't seem to remember if you named it my_div or myDiv. You could try both and see what
| works, or...

...

最後に、これまでに、作成した変数を確認したいけれど、my_div だったか、myDiv だったか忘れてしまったとしましょう。両方とも試してみることも一つですが…

  |=======================================================================================  |  97%
| You can type the first two letters of the variable name, then hit the Tab key (possibly more
| than once). Most programming environments will provide a list of variables that you've created
| that begin with 'my'. This is called auto-completion and can be quite handy when you have many
| variables in your workspace. Give it a try. (If auto-completion doesn't work for you, just type
| my_div and press Enter.)

変数の最初の2文字を入力し、Tab キーを(二回以上かもしれませんが)押してみます。殆どの、プログラム環境で、‘my’ から始まる変数のリストを表示してくれます。これを auto-completion(自動補完機能)と呼びます。たくさんの変数を扱うとき、便利な機能です。やってみてください。(もし、auto-completion がうまくいかなければ、単に、my_div と入力し Enter してください。)

my_div
## [1] 3.478505 3.181981 2.146460
| That's the answer I was looking for.

そうです。それが、期待したものです。

  |=========================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: No

Selection: 2

Coursera.org の終了認定が欲しいですか。

1 を入力すると、2桁の token(証明のしるし)を入力しなさいと表示されます。Moocs の 一つの、Coursera で使われていたものなのでしょう。

注:Beginner 用の R Programming の各 Lesson の最後に、‘Would you like to receive credit for completing this course on Coursera.org?’ とメッセージが表示されますが、2: No を選択してください。(swirl Course Repository 参照)

| Nice work!

それで良いですよ。

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 0

1課の最後までたどり着きました。メニューに戻るには、

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: `swirl` コースのリストへ.

選択:0 (SL が入力)

| Leaving swirl now. Type swirl() to resume.

swirl を終了します。再開するときは、swirl() と入力してください。

2.2. Workspace and Files

Unix などの file system(ファイルシステム)に慣れている方は別ですが、そうでないと、混乱するかもしれませんので、ていねいに説明します。
この課の内容は、実際のプログラムをはじめると、知っていると便利で、理解がしやすく、複雑なプログラムになってくると必須ですが、RStudio で R を使う場合には、右下の Pane(窓枠)の Files Tab や、RStudio の menu で代替できる部分も含みます。

library(swirl)

swirl をはじめるときは、ここからはじめます。(Install していない場合は、1.2 swirl 起動までを読んでください。また、1: Basic Building Blocks(第1課:基本的構成要素)の続きとして、この課を学ぶときは、不要です。)

| Hi! I see that you have some variables saved in your workspace. To keep things running smoothly,
| I recommend you clean up before starting swirl.

| Type ls() to see a list of the variables in your workspace. Then, type rm(list=ls()) to clear
| your workspace.

| Type swirl() when you are ready to begin.

あなたの作業領域に、いくつか、変数を保存してあるようですね。(保存していないかたもいるかもしれません。)支障なく swirl で学ぶために、swirl を起動する前に、それらを消去しておくことをお勧めします。

ls() と入力すると、あなたの作業領域にある、変数のリストが得られます。消去するには、rm(list=ls()) と入力します。

swirl() と入力してはじめましょう。

> ls()
[1] "my_div"  "my_sqrt" "x"       "y"       "z"
> rm(list=ls())
> swirl()
  • これらは、2.1 にある、R Programming 1: Basic Building Blocks(R によるプログラム 第1課:基本的構成要素)のときに使った変数です。それが残っていました。
| Welcome to swirl! Please sign in. If you've been here before, use the same name as you did then.
| If you are new, call yourself something unique.

What shall I call you? SL

swirl にようこそ。名前を登録して始めましょう。以前に登録した場合には、同じ名前を使ってください。はじめてのかたは、あなたのことをなんと呼べば良いか、わかりやすい名前を入力してください。

何とお呼びしますか? SL (SL が入力)

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: swirl コースのリストへ.

選択:1 (SL が入力)

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files        3: Sequences of Numbers    
 4: Vectors                    5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                      9: Functions               
10: lapply and sapply         11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times           15: Base Graphics           


Selection: 2

どの課(クラス)からはじめるか選択してください。0 を入力すると、コースのメニューに戻ります。

1: 基本的構成要素 2: 作業スペースとファイル 3: 数列
4: ベクトル 5: 欠損値 6: ベクトルの部分抽出
7: 行列と、データフレーム 8: 論理 9: 関数
10: lapplysapply 11: vapplytapply 12: データの観察
13: シミュレーション 14: 日付と時間 15: グラフィックの基本

選択:2 (SL が入力)

  |                                                                                         |   0%

| In this lesson, you'll learn how to examine your local workspace in R and begin to explore the
| relationship between your workspace and the file system of your machine.

...

この課では、R の作業領域 (workspace) と、あなたが使っているコンピュータの、ファイルシステムの関係について学びます。(… まで読んだら、Enter です。)

  |==                                                                                       |   3%
| Because different operating systems have different conventions with regards to things like file
| paths, the outputs of these commands may vary across machines.

...

一般的には、異なるオペレーティングシステム(Windows, MacOS, Linux)では、ファイルのパス(そのファイルにたどり着く「道 (path)」の情報)や、それに関する出力形式が異なります。(… まで読んだら、Enter です。以下省略。)

Windows や、Mac では、GUI (Graphical User Interface) を使うことが多く、フォルダ (Folder) という言い方をしますが、File System としては、ディレクトリ (Directory)と呼びます。Unix では、ディレクトリも File として扱いますが、それは、今は、必要ないでしょう。ディレクトリとあったら、Windows や、Mac の フォルダ (Folder) だと思ってください。その中の書類(画像ファイルなども含めて)が File です。

  |=====                                                                                    |   5%
| However it's important to note that R provides a common API (a common set of commands) for
| interacting with files, that way your code will work across different kinds of computers.

...

しかし、R では、オペレーティングシステムに依存しない、共通の API (Application Programming Interface, 共通の命令群) を使って操作することができます。

  |=======                                                                                  |   8%
| Let's jump right in so you can get a feel for how these special functions work!

...

これらの関数がどのように作用するのか、ともかく使ってみましょう。

  |=========                                                                                |  10%
| Determine which directory your R session is using as its current working directory using
| getwd().

現在の作業ディレクトリ(Current Working Directory)名を、getwd() で表示してください。

’>’ は R Prompt で入力する必要はありません。出力は、隠してあります。

> getwd()
[1] *********

| All that practice is paying off!

練習のおかげですね。 |=========== | 13% | List all the objects in your local workspace using ls(). ls() を使って、あなたの作業領域にある、Objects(もの、保存した変数など)を表示してください。

> ls()
character(0)

| That's correct!

正しいです。

  |==============                                                                           |  15%
| Some R commands are the same as their equivalents commands on Linux or on a Mac. Both Linux and
| Mac operating systems are based on an operating system called Unix. It's always a good idea to
| learn more about Unix!

...

Rの命令のいくつかは、Linux や、Mac の同様の命令と同じです。Mac は、Unix という オペレーティングシステムの上にできています。Unix について学ぶと良いですよ。

  |================                                                                         |  18%
| Assign 9 to x using x <- 9.

9 を 変数 x に割り当て(置い)てください。

> x <- 9

| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |==================                                                                       |  21%
| Now take a look at objects that are in your workspace using ls().

あなたの作業領域 (workspace) の Objects がどうなったか表示してください。

> ls()
[1] "x"

| You're the best!

最高です。

  |=====================                                                                    |  23%
| List all the files in your working directory using list.files() or dir().

list.files() または、dir() を使って、作業ディレクトリにある、files を表示してください。

> list.files()
[1] *********

| You are amazing!

驚きです。

  |=======================                                                                  |  26%
| As we go through this lesson, you should be examining the help page for each new function. Check
| out the help page for list.files with the command ?list.files.

この課の学びを進めるとき、新しい関数が登場したら、Help ページをチェックすることをお勧めします。まず、?list.files として、list.files について調べてください。

RStudio を使っているときは、右下の Pane の Help タブに、説明が表示されます。

> ?list.files

| Keep working like that and you'll get there!

その調子で良いですよ。

  |=========================                                                                |  28%
| One of the most helpful parts of any R help file is the See Also section. Read that section for
| list.files. Some of these functions may be used in later portions of this lesson.

...

R の Help の See Also(参照)の部分はとても有用ですから、list.files の対応する部分を読んでください。See Also にある、関数は、あとで、使うかもしれません。

  |===========================                                                              |  31%
| Using the args() function on a function name is also a handy way to see what arguments a
| function can take.
...

args() はその関数の引数(その関数を使うときに必要な変数)を表示します。

  |==============================                                                           |  33%
| Use the args() function to determine the arguments to list.files().

args() を使って、list.files() の引数を表示してください。

> args(list.files)
function (path = ".", pattern = NULL, all.files = FALSE, full.names = FALSE, 
    recursive = FALSE, ignore.case = FALSE, include.dirs = FALSE, 
    no.. = FALSE) 
NULL

| Keep working like that and you'll get there!

その調子で良いですよ。

args() を Help (RStudio なら右下の Pane の Help Tab)で、args と入れて調べてみることをお勧めします。

  |================================                                                         |  36%
| Assign the value of the current working directory to a variable called "old.dir".

現在の作業ディレクトリを old.dir という変数に割り当ててください。

> old.dir <- getwd()

| Great job!

やりましたね。

  |==================================                                                       |  38%
| We will use old.dir at the end of this lesson to move back to the place that we started. A lot
| of query functions like getwd() have the useful property that they return the answer to the
| question as a result of the function.

...

old.dir はこの課の最後に使います。getwd() のような、質問に答えてくれる関数は、その質問の答えを返してくれるので、便利です。

  |=====================================                                                    |  41%
| Use dir.create() to create a directory in the current working directory called "testdir".

現在の作業ディレクトリに、“testdir” という名前の新しい、Directory(ディレクトリ)を dir.create() を使って、作成してください。

少しずつ用語を英語に切り替えていきます。英語に慣れるためです。

> dir.create("testdir")

| Great job!

やりましたね。

  |=======================================                                                  |  44%
| We will do all our work in this new directory and then delete it after we are done. This is the
| R analog to "Take only pictures, leave only footprints."

...

これから、この新しい Directory で、すべての作業をし、すべて終了してから、それを、消去します。これは、R を使った、いわゆる、「とるものは写真だけ、残すのは足跡のみにしてください」(観光地での標語)の実践です。

  |=========================================                                                |  46%
| Set your working directory to "testdir" with the setwd() command.

“testdir” を、setwd() を使って、作業ディレクトリに指定してください。

> setwd("testdir")

| Excellent work!

素晴らしい。

  |===========================================                                              |  49%
| In general, you will want your working directory to be someplace sensible, perhaps created for
| the specific project that you are working on. In fact, organizing your work in R packages using
| RStudio is an excellent option. Check out RStudio at http://www.rstudio.com/

...

一般的には、作業ディレクトリは、あるプロジェクトを実行するための実用的な Directory(ディレクトリ)です。実際、RStudio では、まず、プロジェクトの Directory(ディレクトリ)を作成し、R Package を使って、プロジェクトを進めていきますが、とてもすぐれた方法です。RStudio については、http://www.rstudio.com/ で調べてください。

  |==============================================                                           |  51%
| Create a file in your working directory called "mytest.R" using the file.create() function.

file.create() 関数を使って、あなたの作業ディレクトリに、“mytest.R” という名前のファイルを作成してください。

> file.create("mytest.R")
[1] TRUE

| You're the best!

最高です。

  |================================================                                         |  54%
| This should be the only file in this newly created directory. Let's check this by listing all
| the files in the current directory.

新しく作成した Directory(ディレクトリ)にあるのは、このファイルだけのはずです。現在の Directory(ディレクトリ)のファイルをすべて表示してください。

> list.files()
[1] "mytest.R"

| You got it!

その通りです。

  |==================================================                                       |  56%
| Check to see if "mytest.R" exists in the working directory using the file.exists() function.

file.exists() 関数を利用して、“mytest.R” という名前のファイルが、作業ディレクトリに存在するかどうか調べてください。

> file.exists("mytest.R")
[1] TRUE

| That's the answer I was looking for.

そうです。それが、期待したものです。

  |====================================================                                     |  59%
| These sorts of functions are excessive for interactive use. But, if you are running a program
| that loops through a series of files and does some processing on each one, you will want to
| check to see that each exists before you try to process it.

...

これらの関数は、対話型で利用するときには、必要ないかもしれませんが、プログラムを動かし、たくさんのファイルについて、何らかの作業をする場合など、そのファイルが存在するかどうか確認することは、必要になってきます。

  |=======================================================                                  |  62%
| Access information about the file "mytest.R" by using file.info().

file.info() を利用して、“mytest.R” という名前のファイルの情報を得てみましょう。

出力の一部を消去しました。

> file.info("mytest.R")
         size isdir mode               mtime               ctime               atime uid gid
mytest.R    0 FALSE  644 2020-03-31 17:51:09 2020-03-31 17:51:09 2020-03-31 17:51:10 501  20
           uname grname
mytest.R ******  *******

| Your dedication is inspiring!

感動ものです。

  |=========================================================                                |  64%
| You can use the $ operator --- e.g., file.info("mytest.R")$mode --- to grab specific items.

...

$ 作用素を使うこともできます。たとえば、file.info(“mytest.R”)$mode とすると、情報の一部を取り出すことができます。

  |===========================================================                              |  67%
| Change the name of the file "mytest.R" to "mytest2.R" by using file.rename().

> file.rename("mytest.R", "mytest2.R")
[1] TRUE

file.rename() 関数を使って、“mytest.R” という名前のファイル名を、“mytest2.R” に変えてみてください。

| That's the answer I was looking for.

そうです。それが、期待したものです。

  |==============================================================                           |  69%
| Your operating system will provide simpler tools for these sorts of tasks, but having the
| ability to manipulate files programatically is useful. You might now try to delete mytest.R
| using file.remove('mytest.R'), but that won't work since mytest.R no longer exists. You have
| already renamed it.

...

あなたが使っているオペレーティングシステムでは、もっと簡単に同様のことができるかもしれません。しかし、File についてのこのような作業をプログラムとしてできることは、有用です。mytest.R というファイルを file.remove(‘mytest.R’) を使って、削除しようとします。しかし、すでに、そのファイルは、名前を変えてあり、存在しませんから、その作業はできません。

  |================================================================                         |  72%
| Make a copy of "mytest2.R" called "mytest3.R" using file.copy().

“mytest2.R” の複製(copy)をして、“mytest3.R” という名前のファイルを、file.copy() 関数で作成してください。

> file.copy("mytest2.R", "mytest3.R")
[1] TRUE

| You are doing so well!

とてもうまくいっていますよ。

  |==================================================================                       |  74%
| You now have two files in the current directory. That may not seem very interesting. But what if
| you were working with dozens, or millions, of individual files? In that case, being able to
| programatically act on many files would be absolutely necessary. Don't forget that you can,
| temporarily, leave the lesson by typing play() and then return by typing nxt().

...

現在のディレクトリに、二つのファイルがあります。それほど面白いと思わないかもしれませんが、たくさんの、そして、無数のファイルを扱うときには、プログラムを使うことが必要です。一度、play() と入力することで、この課を離れて、また、nxt() と入力して、戻ってみてください。

この間に、復習をしたり、いくつかの関数の Help を見てみることもお勧めです。

  |====================================================================                     |  77%
| Provide the relative path to the file "mytest3.R" by using file.path().

“mytest3.R” の相対パス(現在のディレクトリからのパス)を、file.path() を使って、求めてください。

パスは、そのディレクトリやファイルにたどり着く「道 (path)」の情報のことでした。どこからはじめるかにりょい、相対パス (relative path)、絶対パス (absolute path or full path) という用語が使われます。絶対パスは、そのコンピュータの一番ものと root という場所からのパスのことです。作業ディレクトリの概念をはっきりしておくことで、そこからの、相対パスで作業ができ、他のコンピュータで作業をするときにも、同じプログラムが使えるようになります。

> file.path("mytest3.R")
[1] "mytest3.R"

| You nailed it! Good job!

見事です。 |======================================================================= | 79% | You can use file.path to construct file and directory paths that are independent of the | operating system your R code is running on. Pass 'folder1' and 'folder2' as arguments to | file.path to make a platform-independent pathname. file.path 関数を使って、ファイルや、ディレクトリのパスを、R を走らせている、オペレーティングシステムに依存せずに、作成することができます。file.path の引数(arguments)として、‘folder1’ と ‘folder2’ をわたして、file.path で、パスの名前を作成してください。

> file.path('folder1', 'folder2')
[1] "folder1/folder2"

| Excellent work!

素晴らしい。

  |=========================================================================                |  82%
| Take a look at the documentation for dir.create by entering ?dir.create . Notice the 'recursive'
| argument. In order to create nested directories, 'recursive' must be set to TRUE.

dir.create について、?dir.create として、説明を見てください。‘recursive’(再帰的)という引数があるのがわかりますか。ディレクトリの中にディレクトリが入った、入れ子にするときには、‘recursive’ を TRUE にします。

> ?dir.create

| Your dedication is inspiring!

感動ものです。

  |===========================================================================              |  85%
| Create a directory in the current working directory called "testdir2" and a subdirectory for it
| called "testdir3", all in one command by using dir.create() and file.path().

現在のディレクトリの中に、“testdir2” という名のディレクトリを作成し、その中に、“testdir3” というディレクトリを作成してください。dir.create() と file.path() を組合せて、一つの命令で実行することができます。

> dir.create(file.path("testdir2","testdir3"), recursive = TRUE)

| You're the best!

最高です。

  |==============================================================================           |  87%
| Go back to your original working directory using setwd(). (Recall that we created the variable
| old.dir with the full path for the orginal working directory at the start of these questions.)

> setwd(old.dir)

最初の、作業ディレクトリに setwd() を使って戻りましょう。(old.dir に完全パスを保存しておきました。)

| Keep up the great work!

その調子です。

  |================================================================================         |  90%
| It is often helpful to save the settings that you had before you began an analysis and then go
| back to them at the end. This trick is often used within functions; you save, say, the par()
| settings that you started with, mess around a bunch, and then set them back to the original
| values at the end. This isn't the same as what we have done here, but it seems similar enough to
| mention.

...

作業を始める前にその設定を保存しておいて、最後に戻ってくることは有効です。このことは、関数の中で、たとえば、par() と設定して、はじめ、混乱するようなことをしても、最初に戻ってくるようにできます。これは、ここでしたこととは異なりますが、同様なことなので、ここに書きました。

  |==================================================================================       |  92%
| After you finish this lesson delete the 'testdir' directory that you just left (and everything
| in it)

...

この課が終わったら、いま、出てきた’testdir’ を削除して、その中にあるものも含めて消去してください。

  |====================================================================================     |  95%
| Take nothing but results. Leave nothing but assumptions. That sounds like 'Take nothing but
| pictures. Leave nothing but footprints.' But it makes no sense! Surely our readers can come up
| with a better motto . . .

...

結果だけを得れば良い。仮定だけわかればよい。これは、「とるものは写真だけ、残すのは足跡のみにしてください」(観光地での標語)と聞こえますね。しかし、そうでしょうか。みなさんには、もっとよい、標語がありますよね。

  |=======================================================================================  |  97%
| In this lesson, you learned how to examine your R workspace and work with the file system of
| your machine from within R. Thanks for playing!

...

この課では、R の作業空間と、ファイルシステムを、R でどのように操作するかを学びました。

  |=========================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

Coursera.org の終了認定が欲しいですか。

注:Beginner 用の R Programming の各 Lesson の最後に、‘Would you like to receive credit for completing this course on Coursera.org?’ とメッセージが表示されますが、1: No を選択してください。(swirl Course Repository 参照)

| Excellent work!

素晴らしい。

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 0

2課の最後までたどり着きました。メニューに戻るには、

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: `swirl` コースのリストへ.

選択:0 (SL が入力)

| Leaving swirl now. Type swirl() to resume.

swirl を終了します。再開するときは、swirl() と入力してください。

> 

2.3. Sequences of Numbers

数列 (Sequences of Numbers) を作る方法を練習します。基本的ですから、よくご存じの方以外は、スキップしないでください。
swirl() ではじめるところから、解説します。最初に、swirl を Install したり、library(swirl) を実行する部分は、省略します。よくわからない方は、2.2 または、2.1 の最初の部分を復習してください。

> swirl()

次のステップも同じですから、和訳や解説は省略します。

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you did then.
| If you are new, call yourself something unique.

What shall I call you? SL

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files        3: Sequences of Numbers    
 4: Vectors                    5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                      9: Functions               
10: lapply and sapply         11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times           15: Base Graphics           


Selection: 3

3: Sequences of Numbers(第3課 数列)を選択しました。

  |                                                                                         |   0%

| In this lesson, you'll learn how to create sequences of numbers in R.

...

この課では、R で数列を作る方法を学びます。

  |====                                                                                     |   4%
| The simplest way to create a sequence of numbers in R is by using the `:` operator. Type 1:20 to
| see how it works.

R で一番簡単に数列を作るには、: 作用素を使います。1:20 と入力し、どんな具合か見てみましょう。

> 1:20
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

| That's the answer I was looking for.

そうです。それが、期待したものです。

  |========                                                                                 |   9%
| That gave us every integer between (and including) 1 and 20. We could also use it to create a
| sequence of real numbers. For example, try pi:10.

1からはじまり20までの整数(integer)の列となります。実数からも数列を作れます。pi:10 を試してみてください。

> pi:10
[1] 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 9.141593

| That's correct!

正しいです。

|============                                                                             |  13%
| The result is a vector of real numbers starting with pi (3.142...) and increasing in increments
| of 1. The upper limit of 10 is never reached, since the next number in our sequence would be
| greater than 10.

...

結果として得られるベクトル(数が並んだもの)はまず、pi(3.142…) から始まり、1 ずつ加えていき、10を超さない範囲に収まるようになっています。

|===============                                                                          |  17%
| What happens if we do 15:1? Give it a try to find out.

では、15:1 ではどうでしょうか。試してみてください。

ちょっと間違ってみました。1ではなく、17と入れてみました。

> 15:17
[1] 15 16 17
| Give it another try. Or, type info() for more options.

| Type 15:1 and press Enter.

もう一度やってみたください。info() 入力してみるのも一つです。15:1 を入力し、Enter を押します。

ということで、info() の方をまず試してみました。

> info()

| When you are at the R prompt (>):
| -- Typing skip() allows you to skip the current question.
| -- Typing play() lets you experiment with R on your own; swirl will ignore what you do...
| -- UNTIL you type nxt() which will regain swirl's attention.
| -- Typing bye() causes swirl to exit. Your progress will be saved.
| -- Typing main() returns you to swirl's main menu.
| -- Typing info() displays these options again.

覚えていますか。最初にこの説明がありました。この日本語を知りたい人は、1.3 を参照してください。

> 15:1
 [1] 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1

| Keep working like that and you'll get there!

その調子で良いですよ。

|===================                                                                      |  22%
| It counted backwards in increments of 1! It's unlikely we'd want this behavior, but nonetheless
| it's good to know how it could happen.
...

1ずつ逆向きに(数を減らしていく方向に)数列を生成しました。このようなことは、考えないかもしれませんが、何が起こるか知っておいてもよいと思います。

|=======================                                                                  |  26%
| Remember that if you have questions about a particular R function, you can access its
| documentation with a question mark followed by the function name: ?function_name_here. However,
| in the case of an operator like the colon used above, you must enclose the symbol in backticks
| like this: ?`:`. (NOTE: The backtick (`) key is generally located in the top left corner of a
| keyboard, above the Tab key. If you don't have a backtick key, you can use regular quotes.)

...

R 関数について、質問があるときは、その関数の前に ? をつけて Help を求めるのでした。しかし、この場合のコロン(:)の場合には、baktick (`) で囲む必要があります。通常は、キーボードの左上、TAB キーの上にあります。もし、見つけられないときは、通常の引用符を使ってください。

ちょっと試してみましたが、他の引用符ではうまくいかないようです。また、RStudio の右下の pane の虫眼鏡の窓に入れるときは、: のままが良いようです。ただ、この backtick は重要なので、ちょっと説明しておきます。上の、説明は、通常 ASCII キーボード、US キーボードと言われるものの配列です、日本では、JIS キーボードを使っている場合が多いと思うので(ASCII にひらがなをつけているものもあります。左上、TAB の上に、backtick があれば、ASCII 型です。)、P の右横、@ の上にあります。ということで、シフトを押して、@ となります。ついでに、引用符(quotation mark)について少し書いておきます。多用するようになったときに、もう一度書きます。すでに、出てきていますが、single quote (’) と、double quote (") があります。R では、基本的に、どちらも同じだと考えてください。

|===========================                                                              |  30%
| Pull up the documentation for `:` now.

: を使って、説明文書を見てみてください。

? のあとにすぐ : を入力すればよいのですが、space を一つ入れてみました。問題ありませんね。どんどん、試してみてください。たいへんなことは起こりません。心配なら、play() として、一旦、swirl を抜けて、練習をし、nxt() で戻ってくれば良いでしょう。

> ? `:`

| Excellent work!

素晴らしい。

|===============================                                                          |  35%
| Often, we'll desire more control over a sequence we're creating than what the `:` operator gives
| us. The seq() function serves this purpose.

...

: よりももっと複雑なことをしたいときがあります。そのときは、seq() を使います。

|===================================                                                      |  39%
| The most basic use of seq() does exactly the same thing as the `:` operator. Try seq(1, 20) to
| see this.

最も基本的な使い方として、: と同じことをしてみましょう。seq(1,20) を試してください。

> seq(1,20)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

| All that practice is paying off!

練習のおかげですね。

|=======================================                                                  |  43%
| This gives us the same output as 1:20. However, let's say that instead we want a vector of
| numbers ranging from 0 to 10, incremented by 0.5. seq(0, 10, by=0.5) does just that. Try it out.

1:20 と同じ結果を得ました。しかし、たとえば、0 からはじめ 10 までの数列ですが、増加分を、0.5 としてみましょう。seq(0, 10, by=0.5) です。試してみてください。

> seq(0, 10, by=0.5)
 [1]  0.0  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0
[20]  9.5 10.0

| You are amazing!

驚きです。

|===========================================                                              |  48%
| Or maybe we don't care what the increment is and we just want a sequence of 30 numbers between 5
| and 10. seq(5, 10, length=30) does the trick. Give it a shot now and store the result in a new
| variable called my_seq.

増加分ではなく、全体で 30個の数から成る列を生成したいとします。そのときは、seq(5, 10, length=30) を使います。やってみましょう。その結果を、my_seq という名前の変数に割り当ててください。

> my_seq <- seq(5, 10, length=30)

| Keep up the great work!

その調子です。

|==============================================                                           |  52%
| To confirm that my_seq has length 30, we can use the length() function. Try it now.

たしかに、長さが、30 の数列を生成したことを、length() 関数を用いて、確認してください。

> length(my_seq)
[1] 30

| Great job!

やりましたね。

|==================================================                                       |  57%
| Let's pretend we don't know the length of my_seq, but we want to generate a sequence of integers
| from 1 to N, where N represents the length of the my_seq vector. In other words, we want a new
| vector (1, 2, 3, ...) that is the same length as my_seq.

...

いま、my_seq の長さを知らないと仮定して、その長さと同じ、1 からはじめ N までの数列を作ろうとします。ここで、N は、my_seq の長さになります。

|======================================================                                   |  61%
| There are several ways we could do this. One possibility is to combine the `:` operator and the
| length() function like this: 1:length(my_seq). Give that a try.

いくつかの方法がありますが、一つの方法は、1:length(my_seq) とすることです。試してみてください。

> 1:length(my_seq)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

| Your dedication is inspiring!

感動ものです。

|==========================================================                               |  65%
| Another option is to use seq(along.with = my_seq). Give that a try.

もう一つの方法は、seq(along.with = my_seq) です。試してみてください。

> seq(along.with = my_seq)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

| All that hard work is paying off!

努力は報われます。

|==============================================================                           |  70%
| However, as is the case with many common tasks, R has a separate built-in function for this
| purpose called seq_along(). Type seq_along(my_seq) to see it in action.

よくある作業については、これとは、別に、そのための関数も R に用意されています。seq_along()

> seq_along(my_seq)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

| All that hard work is paying off!

努力は報われます。

|==================================================================                       |  74%
| There are often several approaches to solving the same problem, particularly in R. Simple
| approaches that involve less typing are generally best. It's also important for your code to be
| readable, so that you and others can figure out what's going on without too much hassle.

...

R では、同じ問題を解決するのにいくつもの方法がある場合があります。一般的に一番良いのは、入力する文字数が少ないことです。それ以外にも、code(プロセスを記号化したもの、プログラム)が他のひとにとっても読みやすく (readable) 理解しやすいということも大切です。

|======================================================================                   |  78%
| If R has a built-in function for a particular task, it's likely that function is highly
| optimized for that purpose and is your best option. As you become a more advanced R programmer,
| you'll design your own functions to perform tasks when there are no better options. We'll
| explore writing your own functions in future lessons.

...

Rに付属の関数は、特定の作業のために特化しています。あなたが、上級の R のプログラマーにあんると、目的に合った命令を、自分で作ろうとするでしょう。その方法については、いずれ勉強します。

|==========================================================================               |  83%
| One more function related to creating sequences of numbers is rep(), which stands for
| 'replicate'. Let's look at a few uses.

...

もう一つの数列を作成する関数として、rep() があります。これは、‘replicate’(複製)からとられています。いくつか例を見てみましょう。

|=============================================================================            |  87%
| If we're interested in creating a vector that contains 40 zeros, we can use rep(0, times = 40).
| Try it out.

40個 0 が並んだ列を作りたいとします。そのときは、rep(0, times = 40) を使います。試してみてください。

> rep(0, times = 40)
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

| Nice work!

それで良いですよ。

|=================================================================================        |  91%
| If instead we want our vector to contain 10 repetitions of the vector (0, 1, 2), we can do
| rep(c(0, 1, 2), times = 10). Go ahead.

(0,1,2) というベクトルを、10個ならべたものを作成するときは、rep(c(0, 1, 2), times = 10) とします。さあ、やってみましょう。

> rep(c(0, 1, 2), times = 10)
 [1] 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2

| Excellent work!

素晴らしい。

|=====================================================================================    |  96%
| Finally, let's say that rather than repeating the vector (0, 1, 2) over and over again, we want
| our vector to contain 10 zeros, then 10 ones, then 10 twos. We can do this with the `each`
| argument. Try rep(c(0, 1, 2), each = 10).

最後に (0, 1, 2) を何回も繰り返すのではなく、0 を 10回、1 を 10回、2 を 10回繰り返したいときは、argument(引数)の eachi を使います。 rep(c(0, 1, 2), each = 10) を試してみてください。

> rep(c(0, 1, 2), each = 10)
 [1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

| Keep working like that and you'll get there!

その調子で良いですよ。

|=========================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

Coursera.org の終了認定が欲しいですか。

2 を入力すると、2桁の token(証明のしるし)を入力しなさいと表示されます。Moocs の 一つの、Coursera で使われていたものなのでしょう。

注:Beginner 用の R Programming の各 Lesson の最後に、‘Would you like to receive credit for completing this course on Coursera.org?’ とメッセージが表示されますが、2: No を選択してください。(swirl Course Repository 参照)

| Keep up the great work!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 0

その調子です。

課の最後までたどり着きました。メニューに戻るには、

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: `swirl` コースのリストへ.

選択:0 (SL が入力)

| Leaving swirl now. Type swirl() to resume.

swirl を終了します。再開するときは、swirl() と入力してください。

> 

2.4. Vectors

数ベクトル(numeric vectors)については、2.1 Basic Building Blocks(基本的構成要素)で、すでに、少し学びましたが、文字列や、論理値のベクトルについて学びます。R がどのように、ベクトルを扱うのかは、とても基本的ですので、よくご存じの方以外は、スキップしないでください。
swirl() ではじめるところから、解説します。最初に、swirl を Install したり、library(swirl) を実行する部分は、省略します。よくわからない方は、1.2 または、1.1 の最初の部分を復習してください。

> swirl()

次のステップも同じですから、和訳や解説は省略します。

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you did then.
| If you are new, call yourself something unique.

What shall I call you? SL

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files        3: Sequences of Numbers    
 4: Vectors                    5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                      9: Functions               
10: lapply and sapply         11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times           15: Base Graphics           


Selection: 4

4: Vectors(第4課 ベクトル)を選択しました。

  |                                                                                         |   0%

| The simplest and most common data structure in R is the vector.

...

R の一番簡単な、データ構造(data structure)は vector(ベクトル)です。

  |==                                                                                       |   3%
| Vectors come in two different flavors: atomic vectors and lists. An atomic vector contains
| exactly one data type, whereas a list may contain multiple data types. We'll explore atomic
| vectors further before we get to lists.

...

ベクトルには、二種類あります。atomic vectors(原子ベクトル)と lists(リスト)です。atomic vector は、一つの種類(クラス)のデータを含みますが、list の方は、複数の種類(クラス)のデータを含むことが可能です。ここでは、list を学ぶ前に、一種類(クラス)のデータからなる atomic vector について学びます。

  |=====                                                                                    |   5%
| In previous lessons, we dealt entirely with numeric vectors, which are one type of atomic
| vector. Other types of atomic vectors include logical, character, integer, and complex. In this
| lesson, we'll take a closer look at logical and character vectors.

...

以前の課では、数ベクトル (numeric vectors) について学びました。これは、atomic vector の一つです。他の種類の atomic vector には、論理、文字、整数、複素数を含むものがあります。この課では、論理ベクトルと、文字ベクトルについて学びます。

  |=======                                                                                  |   8%
| Logical vectors can contain the values TRUE, FALSE, and NA (for 'not available'). These values
| are generated as the result of logical 'conditions'. Let's experiment with some simple
| conditions.

...

論理ベクトルは、TRUE(真), FALSE(偽), NA (‘not available’ 不明) の三種類の値から成ります。これらは、論理的な条件(conditions)によって、生成されます。いくつかの条件について見てみましょう。

  |=========                                                                                |  11%
| First, create a numeric vector num_vect that contains the values 0.5, 55, -10, and 6.

まず、0.5, 55, -10, 6 からなる数ベクトル、num_vect を作成してください。

> num_vect <- c(0.5, 55, -10, 6)

| All that hard work is paying off!

努力は報われます。

  |============                                                                             |  13%
| Now, create a variable called tf that gets the result of num_vect < 1, which is read as
| 'num_vect is less than 1'.

今度は、num_vect < 1 の結果からなる、tf という名前のベクトルを作ってください。「num_vec は、1 より小さい」という条件です。

> tf <- num_vect < 1

| Nice work!

それで良いですよ。

  |==============                                                                           |  16%
| What do you think tf will look like?

1: a single logical value
2: a vector of 4 logical values

tf はどんな形をしていると思いますか。

1: 一つの値
2: 四つの論理値からなるベクトル

Selection: 2

| Great job!

やりましたね。

  |================                                                                         |  18%
| Print the contents of tf now.

tf の中身を見てください。

> tf
[1]  TRUE FALSE  TRUE FALSE

| Excellent job!

素晴らしい。

  |===================                                                                      |  21%
| The statement num_vect < 1 is a condition and tf tells us whether each corresponding element of
| our numeric vector num_vect satisfies this condition.

...

statement num_vect < 1 は、数ベクトル num_vect のそれぞれの、成分が、条件を満たしているかということです。

  |=====================                                                                    |  24%
| The first element of num_vect is 0.5, which is less than 1 and therefore the statement 0.5 < 1
| is TRUE. The second element of num_vect is 55, which is greater than 1, so the statement 55 < 1
| is FALSE. The same logic applies for the third and fourth elements.

...

num_vect の最初は、0.5 で、これは、1 より小さいので、0.5 < 1 は TRUE(真)です。二つ目は、55 で、1 より大きいので、55 < 1 は、FALSE(偽)です。同じ論理が、num_vect の三つ目、四つ目の成分についても適用されます。

  |=======================                                                                  |  26%
| Let's try another. Type num_vect >= 6 without assigning the result to a new variable.

もう一つ試してみましょう。num_vect >= 6 を新しい、変数に割り当てることはしないで、入力してください。

> num_vect >= 6
[1] FALSE  TRUE FALSE  TRUE

| You got it!

その通りです。

  |==========================                                                               |  29%
| This time, we are asking whether each individual element of num_vect is greater than OR equal to
| 6. Since only 55 and 6 are greater than or equal to 6, the second and fourth elements of the
| result are TRUE and the first and third elements are FALSE.

...

今度は、num_vect のそれぞれの要素が、6 より大きいか、または、等しいか(すなわち、6 以上か)ときいています。55 と 6 は、6 より大きいか、または、等しいので、二番目と、4番目の結果は、TRUE(真)となり、一番目と三番目の結果は、FALSE(偽)

  |============================                                                             |  32%
| The `<` and `>=` symbols in these examples are called 'logical operators'. Other logical
| operators include `>`, `<=`, `==` for exact equality, and `!=` for inequality.

...

<>= は、「論理記号」と呼ばれるものの例です。「論理記号」には他に、>, <=, exact equality(丁度等しい)を表す == と、inequality(等しくない)を表す != があります。

  |==============================                                                           |  34%
| If we have two logical expressions, A and B, we can ask whether at least one is TRUE with A | B
| (logical 'or' a.k.a. 'union') or whether they are both TRUE with A & B (logical 'and' a.k.a.
| 'intersection'). Lastly, !A is the negation of A and is TRUE when A is FALSE and vice versa.

...

二つの論理表現(論理式とも言います)A と B に対して、どちらか一方が正しいという条件を、A | B (logical ‘or’ a.k.a. (also known as)‘union’ 論理的 OR とか、論理和(合併)と呼ばれます。) で表し、どちらも正しいという条件を、A & B(logical ‘and’ a.k.a. ’intersection’論理的 AND とか、論理積(共通部分)と呼ばれます。)最後に、!A は、the negation of A(A の否定)で、A が TRUE(真)なら FALSE(偽)となります、逆の場合も同様です。

  |=================================                                                        |  37%
| It's a good idea to spend some time playing around with various combinations of these logical
| operators until you get comfortable with their use. We'll do a few examples here to get you
| started.

...

慣れるまで、様々なこれらの論理演算子の組合せを考えることに時間をかけるのは良いことです。はじめとして、いくつかの例を試してみましょう。

  |===================================                                                      |  39%
| Try your best to predict the result of each of the following statements. You can use pencil and
| paper to work them out if it's helpful. If you get stuck, just guess and you've got a 50% chance
| of getting the right answer!

...

それぞれの結果を予想してみてください。紙と鉛筆を使って、考えてみるのも良いですよ。行き詰まったら、単にあててみてください。50%の確率で、正解が得られます。

  |=====================================                                                    |  42%
| (3 > 5) & (4 == 4)

1: FALSE
2: TRUE

(3 > 5) & (4 == 4) は、1: FALSE(偽)ですか、2: TRUE (真) ですか。

Selection: 1

| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |========================================                                                 |  45%
| (TRUE == TRUE) | (TRUE == FALSE)

1: TRUE
2: FALSE

(TRUE == TRUE) | (TRUE == FALSE) は、1: TRUE (真)ですか、2: FALSE(偽) ですか。

Selection: 1

| All that practice is paying off!

練習のおかげですね。

  |==========================================                                               |  47%
| ((111 >= 111) | !(TRUE)) & ((4 + 1) == 5)

1: FALSE
2: TRUE

((111 >= 111) | !(TRUE)) & ((4 + 1) == 5) は、1: FALSE(偽)ですか、2: TRUE (真) ですか。

Selection: 2

| You are amazing!

驚きです。

  |============================================                                             |  50%
| Don't worry if you found these to be tricky. They're supposed to be. Working with logical
| statements in R takes practice, but your efforts will be rewarded in future lessons (e.g.
| subsetting and control structures).

...

手が込んだ難しい問題だなと思っても、心配することはありません。確かにそうですね。論理式を扱うには、練習が必要です。しかし、いずれ報われます。(例として、一部分を取り出したり、制御したりするときに)

  |===============================================                                          |  53%
| Character vectors are also very common in R. Double quotes are used to distinguish character
| objects, as in the following example.

...

Character vectors(文字ベクトル)も R で良く使われます。以下の例で示すように、文字からなるものは、Double quote(二重引用符 "")を用いて、他と区別します。

  |=================================================                                        |  55%
| Create a character vector that contains the following words: "My", "name", "is". Remember to
| enclose each word in its own set of double quotes, so that R knows they are character strings.
| Store the vector in a variable called my_char.

“My”, “name”, “is” からなる、Character vectors(文字ベクトル)を作成しましょう。文字列であることを、R に知らせるために、それぞれの言葉を、Double quote(二重引用符 "")で囲むことを忘れずに。そのベクトルに、に、 my_char という名前を付けてください。

> my_char <- c("My", "name", "is")

| Excellent job!

素晴らしい。

  |====================================================                                     |  58%
| Print the contents of my_char to see what it looks like.

my_char はどのように表示されるか見てみましょう。

> my_char
[1] "My"   "name" "is"  

| You got it right!

その通りです。

  |======================================================                                   |  61%
| Right now, my_char is a character vector of length 3. Let's say we want to join the elements of
| my_char together into one continuous character string (i.e. a character vector of length 1). We
| can do this using the paste() function.

...

今は、my_char は、長さ 3 の Character vector(文字ベクトル)ですが、これらを結合させて、一つの文字列(長さ1 の Character vector(文字ベクトル))にしてみましょう。それには、paste() 関数を使います。

  |========================================================                                 |  63%
| Type paste(my_char, collapse = " ") now. Make sure there's a space between the double quotes in
| the `collapse` argument. You'll see why in a second.

paste(my_char, collapse = " ") と入力してください。引数(argument)のcollapse の二重引用符の間には、一つ半角スペースが入っていることに注意してください。なぜか、すぐわかります。

> paste(my_char, collapse = " ")
[1] "My name is"

| You're the best!

最高です。

  |===========================================================                              |  66%
| The `collapse` argument to the paste() function tells R that when we join together the elements
| of the my_char character vector, we'd like to separate them with single spaces.

...

paste() 関数の collapse argument(引数)は、R に結合するときに、間に、space を一つ入れてください、と伝えます。

  |=============================================================                            |  68%
| It seems that we're missing something.... Ah, yes! Your name!

...

なにか、不足していませか。そうそう、あなたの名前です。

  |===============================================================                          |  71%
| To add (or 'concatenate') your name to the end of my_char, use the c() function like this:
| c(my_char, "your_name_here"). Place your name in double quotes where I've put "your_name_here".
| Try it now, storing the result in a new variable called my_name.

my_char の最後に、あなたの名前を加える (add (または ‘concatenate’(結合する)と言います)) には、c() 関数を、c(my_char, “your_name_here”) こんな感じで使います。“your_name_here” の部分に、あなたの名前を入れてください。それを、my_name という新しい変数に割り当ててください。

> my_name <- c(my_char, "SL")

| Keep working like that and you'll get there!

その調子で良いですよ。

  |==================================================================                       |  74%
| Take a look at the contents of my_name.

my_name の中身を見てみましょう。

> my_name
[1] "My"   "name" "is"   "SL"  

| That's correct!

正しいです。

  |====================================================================                     |  76%
| Now, use the paste() function once more to join the words in my_name together into a single
| character string. Don't forget to say collapse = " "!

今度は、paste() 関数をもう一度使って、一つの文字列にしてください。collapse = " " とするのを、忘れないでくださいね。

> paste(my_name, collapse = " ")
[1] "My name is SL"

| Keep working like that and you'll get there!

その調子で良いですよ。

  |======================================================================                   |  79%
| In this example, we used the paste() function to collapse the elements of a single character
| vector. paste() can also be used to join the elements of multiple character vectors.

...

この例では、paste() 関数を使って、Character vector(文字ベクトル)を、一つの文字列からなる Character vector(文字ベクトル)に、結合しました。paste() はいくつもの、Character vector(文字ベクトル)の結合にも使えます。

  |=========================================================================                |  82%
| In the simplest case, we can join two character vectors that are each of length 1 (i.e. join two
| words). Try paste("Hello", "world!", sep = " "), where the `sep` argument tells R that we want
| to separate the joined elements with a single space.

簡単な例として、二つの Character vector(文字ベクトル)でそれぞれが、一つの文字列(長さ 1 の Character vector(文字ベクトル))からなるものを結合します。paste(“Hello”, “world!”, sep = " ") と入力しましょう。ここで 引数(argument)sep は、R に、single space(半角スペース1個)だけあけて、つなげるように伝えています。

> paste("Hello", "world!", sep = " ")
[1] "Hello world!"

| Excellent job!

素晴らしい。

  |===========================================================================              |  84%
| For a slightly more complicated example, we can join two vectors, each of length 3. Use paste()
| to join the integer vector 1:3 with the character vector c("X", "Y", "Z"). This time, use sep =
| "" to leave no space between the joined elements.

もう少し複雑な例として、二つの、長さ3のベクトルを結合してみましょう。paste() を使って、長さ3の整数からなるベクトル 1:3 と、文字ベクトル c(“X”, “Y”, “Z”) を結合してください。今度は、sep = "" として、間に space を入れないようにします。

> paste(1:3, c("X", "Y", "Z"), sep = "")
[1] "1X" "2Y" "3Z"

| Keep up the great work!

その調子です。

  |=============================================================================            |  87%
| What do you think will happen if our vectors are of different length? (Hint: we talked about
| this in a previous lesson.)

...

もし、ベクトルの長さが違っていたらどうなると思いますか。(ヒント:以前、説明したことがあります。)

  |================================================================================         |  89%
| Vector recycling! Try paste(LETTERS, 1:4, sep = "-"), where LETTERS is a predefined variable in
| R containing a character vector of all 26 letters in the English alphabet.

Vector recycling(ベクトル再利用)です。Try paste(LETTERS, 1:4, sep = “-”) を試してみてください。LETTERS は、すでに、R にはじめから付属している文字ベクトルで、26 文字からなる、英語のアルファベットからなっています。

> paste(LETTERS, 1:4, sep = "-")
 [1] "A-1" "B-2" "C-3" "D-4" "E-1" "F-2" "G-3" "H-4" "I-1" "J-2" "K-3" "L-4" "M-1" "N-2" "O-3"
[16] "P-4" "Q-1" "R-2" "S-3" "T-4" "U-1" "V-2" "W-3" "X-4" "Y-1" "Z-2"

| Great job!

やりましたね。

  |==================================================================================       |  92%
| Since the character vector LETTERS is longer than the numeric vector 1:4, R simply recycles, or
| repeats, 1:4 until it matches the length of LETTERS.

...

Character vector(文字ベクトル)LETTERS は、Numeric vector(数ベクトル) 1:4 よりも長さが長いので、R は LETTERS の長さに合うように、Recycle(再利用)します。

  |====================================================================================     |  95%
| Also worth noting is that the numeric vector 1:4 gets 'coerced' into a character vector by the
| paste() function.

...

さらに、Numeric vector(数ベクトル)は、paste() 関数によって、’coerced’され(強制的に)Character vector(文字ベクトル)として扱われます。

  |=======================================================================================  |  97%
| We'll discuss coercion in another lesson, but all it really means is that the numbers 1, 2, 3,
| and 4 in the output above are no longer numbers to R, but rather characters "1", "2", "3", and
| "4".

...

coercion(強制)については、いずれ学びますが、要は、1, 2, 3, 4 という数は、もはや、数ではなく、“1”, “2”, “3”, “4” と、Character(文字)として扱われると言うことです。

  |=========================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: No

Selection: 2

Coursera.org の終了認定が欲しいですか。

1 を入力すると、2桁の token(証明のしるし)を入力しなさいと表示されます。Moocs の 一つの、Coursera で使われていたものなのでしょう。

注:Beginner 用の R Programming の各 Lesson の最後に、‘Would you like to receive credit for completing this course on Coursera.org?’ とメッセージが表示されますが、2: No を選択してください。(swirl Course Repository 参照)

| Excellent work!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 0

素晴らしい。

課の最後までたどり着きました。メニューに戻るには、

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: `swirl` コースのリストへ.

選択:0 (SL が入力)

| Leaving swirl now. Type swirl() to resume.

swirl を終了します。再開するときは、swirl() と入力してください。

> 

2.5. Missing Values

NA 欠損値、および数値計算でおこる NaN について学びます。実際にデータを扱うときに、常に意識することが必要な課題です。
swirl() ではじめるところから、解説します。最初に、swirl を Install したり、library(swirl) を実行する部分は、省略します。よくわからない方は、1.2 または、1.1 の最初の部分を復習してください。

> swirl()

次のステップも同じですから、和訳や解説は省略します。

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you did then.
| If you are new, call yourself something unique.

What shall I call you? SL

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files        3: Sequences of Numbers    
 4: Vectors                    5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                      9: Functions               
10: lapply and sapply         11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times           15: Base Graphics           


Selection: 5

5: Missing Values(第5課 欠損値)を選択しました。

  |                                                                                         |   0%

| Missing values play an important role in statistics and data analysis. Often, missing values
| must not be ignored, but rather they should be carefully studied to see if there's an underlying
| pattern or cause for their missingness.

...

統計や、データ解析では、Missing values(欠損値)は非常に重要な役割を果たします。Missing values(欠損値)は、無視してはいけないもので、それをていねいに、見ていくことで、Missing values(欠損値)を生み出しているパターンや、原因を突き止めることができます。

  |====                                                                                     |   5%
| In R, NA is used to represent any value that is 'not available' or 'missing' (in the statistical
| sense). In this lesson, we'll explore missing values further.

...

R では、NA は、‘not available’(不明)‘missing’ (in the statistical sense)(統計用語の欠損値)を表します。この課では、missing values(欠損値)について学びます。

  |=========                                                                                |  10%
| Any operation involving NA generally yields NA as the result. To illustrate, let's create a
| vector c(44, NA, 5, NA) and assign it to a variable x.

NA を含んでいるものに対する演算では、結果も NA となります。説明のため、x という名前で、vector(ベクトル) c(44, NA, 5, NA) を作成してください。

> x <- c(44, NA, 5, NA)
| You are quite good my friend!

いい具合ですね。

  |=============                                                                            |  15%
| Now, let's multiply x by 3.

x に 3 をかけてください。

> x*3
[1] 132  NA  15  NA

| You got it!

その通りです。

  |==================                                                                       |  20%
| Notice that the elements of the resulting vector that correspond with the NA values in x are
| also NA.

...

x で NA に対応する値は、やはり、NA となりました。 ``` |====================== | 25% | To make things a little more interesting, lets create a vector containing 1000 draws from a | standard normal distribution with y <- rnorm(1000).

y <- rnorm(1000)

もう少し興味深い例で見てみましょう。standard normal distribution(標準正規分布)をする 1000個の要素からなるベクトルを y <- rnorm(1000) によって作成しましょう。

rnorm() については、この課のあとで、少しだけ説明します。

| Great job!

やりましたね。

  |===========================                                                              |  30%
| Next, let's create a vector containing 1000 NAs with z <- rep(NA, 1000).

次に、z <- rep(NA, 1000) で、1000 個 NA を含むベクトルを作成しましょう。

z <- rep(NA, 1000)
| You are doing so well!

とてもうまくいっていますよ。

  |===============================                                                          |  35%
| Finally, let's select 100 elements at random from these 2000 values (combining y and z) such
| that we don't know how many NAs we'll wind up with or what positions they'll occupy in our final
| vector -- my_data <- sample(c(y, z), 100).

最後に、ランダム(無作為)に、y と z の中から、100個抽出して見ましょう。どのぐらい、NA があり、それがどこに現れるかは、わかりません。my_data <- sample(c(y, z), 100) とします。

my_data <- sample(c(y, z), 100)
| You're the best!

最高です。

  |====================================                                                     |  40%
| Let's first ask the question of where our NAs are located in our data. The is.na() function
| tells us whether each element of a vector is NA. Call is.na() on my_data and assign the result
| to my_na.
my_na <- is.na(my_data)
| You got it!

その通りです。

  |========================================                                                 |  45%
| Now, print my_na to see what you came up with.

my_na がどうなっているか見てみましょう。

my_na
##   [1] FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
##  [13] FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE
##  [25] FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE
##  [37] FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE
##  [49] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE
##  [61] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE
##  [73] FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
##  [85]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE
##  [97]  TRUE FALSE FALSE  TRUE
| Excellent work!

素晴らしい。

  |============================================                                             |  50%
| Everywhere you see a TRUE, you know the corresponding element of my_data is NA. Likewise,
| everywhere you see a FALSE, you know the corresponding element of my_data is one of our random
| draws from the standard normal distribution.

...

至るところに TRUE があり、それは、my_data の対応する箇所が、NA であることを表していますね。同様に、FALSE も至るところにあり、my_data の対応する箇所は、standard normal distribution(標準正規分布)をしたベクトルの成分からとったものであることがわかります。

  |=================================================                                        |  55%
| In our previous discussion of logical operators, we introduced the `==` operator as a method of
| testing for equality between two objects. So, you might think the expression my_data == NA
| yields the same results as is.na(). Give it a try.

論理作用素について学んだときに、== を紹介しました。等しいかどうかを判定するためのものでした。my_data == NA とすると、is.na() を使ったときと同じ結果になるでしょうか。試してみましょう。

my_data == NA
##   [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
##  [26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
##  [51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
##  [76] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
| You got it!

その通りです。

  |=====================================================                                    |  60%
| The reason you got a vector of all NAs is that NA is not really a value, but just a placeholder
| for a quantity that is not available. Therefore the logical expression is incomplete and R has
| no choice but to return a vector of the same length as my_data that contains all NAs.

...

すべて NA になってしまったのは、NA が実際の値ではないからです。それは、単に、その値は、不明ですよ、といっているに過ぎません。そこで、R の論理演算の結果はやはり不明とするしかありません。

  |==========================================================                               |  65%
| Don't worry if that's a little confusing. The key takeaway is to be cautious when using logical
| expressions anytime NAs might creep in, since a single NA value can derail the entire thing.

...

ちょっと混乱を起こすかもしれませんが、ここから学ぶべき事は、NA が潜んでいる可能性のあるときには、論理式は、注意して使わなければいけないということです。一つでも、NA があると、すべてを頓挫させてしまうこともあるのです。

  |==============================================================                           |  70%
| So, back to the task at hand. Now that we have a vector, my_na, that has a TRUE for every NA and
| FALSE for every numeric value, we can compute the total number of NAs in our data.

...

my_na に戻ると、NA がある場所では、TRUE、数値がある場所では、FALSE となっています。data に、NA がいくつあるか計算する方法があります。

  |===================================================================                      |  75%
| The trick is to recognize that underneath the surface, R represents TRUE as the number 1 and
| FALSE as the number 0. Therefore, if we take the sum of a bunch of TRUEs and FALSEs, we get the
| total number of TRUEs.

...

R は裏では、TRUE を 1 で、FALSE を 0 としています。そこで、TRUE や、FALSE が並んだものの、和をとると、いくつ TRUE があるかがわかるのです。

  |=======================================================================                  |  80%
| Let's give that a try here. Call the sum() function on my_na to count the total number of TRUEs
| in my_na, and thus the total number of NAs in my_data. Don't assign the result to a new
| variable.

試してみましょう。my_na に、TRUE がいくつあるかは、合計を計算する、sum() 関数を使います。それは、my_data に含まれる、NA の数とも同じです。結果を新たな変数に割り当てる必要はありませんから、試してみてください。

sum(my_na)
## [1] 57
| You're the best!

最高です。

注: sample() は乱数を用いて、抽出していますから、みなさんの答えは異なるかもしれません。sample の Help File を見ることをお勧めします。

  |============================================================================             |  85%
| Pretty cool, huh? Finally, let's take a look at the data to convince ourselves that everything
| 'adds up'. Print my_data to the console.

いい感じですよね。すべてその通りであることを確認するため、my_data を表示して見てください。

my_data
##   [1] -0.147069295 -0.963382444           NA  0.780414679  0.642463844
##   [6]           NA           NA           NA           NA           NA
##  [11] -0.378350540           NA  0.570733419           NA           NA
##  [16]           NA  0.377565638  0.279946765           NA -0.092303691
##  [21]  0.192300169           NA           NA  0.567244011 -1.477804253
##  [26]           NA  1.318418219           NA  0.545210059           NA
##  [31]           NA           NA           NA -0.285524944 -0.417251608
##  [36]           NA -0.599391625           NA  0.651264167           NA
##  [41]           NA           NA           NA  0.133847061           NA
##  [46] -1.490864230           NA  0.973187482  0.749899280           NA
##  [51]           NA           NA           NA           NA -0.096867217
##  [56] -0.007318112           NA -0.370854068 -0.853133795           NA
##  [61]  0.399979909 -0.997494168  1.068418212           NA           NA
##  [66] -1.312858919 -0.989972051           NA           NA           NA
##  [71]           NA  0.284891111 -0.353865109 -1.115493405  0.179683966
##  [76]           NA  1.082757418           NA           NA           NA
##  [81]           NA           NA  1.162159943           NA           NA
##  [86]           NA -0.596187325           NA           NA           NA
##  [91]           NA -0.717349451 -0.079576351  0.023318528           NA
##  [96]           NA           NA  2.830198212  0.183532019           NA
| Excellent work!

素晴らしい。

  |================================================================================         |  90%
| Now that we've got NAs down pat, let's look at a second type of missing value -- NaN, which
| stands for 'not a number'. To generate NaN, try dividing (using a forward slash) 0 by 0 now.

NA については、学びましたから、二つ目の欠損値である NaN について見てみましょう。NaN は、‘not a number’(数にあらず)を意味します。0 を 0 で割って(forward slash ‘/’ でしたね)、NaN を生成してください。

0/0
## [1] NaN
| Your dedication is inspiring!

感動ものです。

  |=====================================================================================    |  95%
| Let's do one more, just for fun. In R, Inf stands for infinity. What happens if you subtract Inf
| from Inf?

もう一つ試してみましょう。Inf は、R では、infinity(無限)を意味します。Inf から Inf を引くとどうなりますか。

Inf - Inf
## [1] NaN
| You are doing so well!

とてもうまくいっていますよ。

  |=========================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: No

Selection: 2

Coursera.org の終了認定が欲しいですか。

_2 を選んでください。何度か説明しましたから、省略します。

| All that hard work is paying off!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 0

努力は報われます。

課の最後までたどり着きました。メニューに戻るには、

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: `swirl` コースのリストへ.

選択:0 (SL が入力)

| Leaving swirl now. Type swirl() to resume.

swirl を終了します。再開するときは、swirl() と入力してください。

備考:

y <- rnorm(1000) と置きました。rnorm は、Help を見てみると、The Normal Distribution とあり、Usage として、rnorm(n, mean = 0, sd = 1)とあります。正規分布はいずれ説明しますが、mean は平均、sd は標準偏差を意味します。mean = 0, sd = 1 と書いてあるのは、Default(初期設定)では、mean = 0, sd = 1 としてあると言う意味です。このように、mean = 0, sd = 1 としたものを、標準正規分布と言います。つまり何も、引数を指定しないと、平均 0、標準偏差 1 の標準正規分布の長さ n の数ベクトルを自動生成します。R 内部の乱数を使っているので、毎回異なるものが生成されます。下では、y と新しく作成した my_norm の mean(平均)と、sd(標準偏差)を計算しています。

mean(y)
## [1] -0.03802323
sd(y)
## [1] 0.9820727
my_norm <- rnorm(1000)
mean(my_norm)
## [1] 0.05933914
sd(my_norm)
## [1] 0.9811571

mean(平均)はほぼ 0、sd(標準偏差)はほぼ 1 になっています。my_norm の histogram(度数分布のグラフ)を付けておきます。

hist(my_norm)

2.6. Subsetting Vectors

vector(ベクトル)の一部分を取り出す三種類の方法を学びます。それぞれに適した方法を選択するためにも、この三種類を自由に使えるようになると良いですね。
swirl() ではじめるところから、解説します。最初に、swirl を Install したり、library(swirl) を実行する部分は、省略します。よくわからない方は、1.2 または、1.1 の最初の部分を復習してください。

> swirl()

次のステップも同じですから、和訳や解説は省略します。

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you did then.
| If you are new, call yourself something unique.

What shall I call you? SL

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files        3: Sequences of Numbers    
 4: Vectors                    5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                      9: Functions               
10: lapply and sapply         11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times           15: Base Graphics           


Selection: 6

6: Subsetting Vectors(第6課 ベクトルの部分抽出)を選択しました。

  |                                                                                         |   0%

| In this lesson, we'll see how to extract elements from a vector based on some conditions that we
| specify.

...

この課では、ある条件のもとで、vector(ベクトル)の一部分を抽出する方法を学びます。全部で4種類あります。

  |==                                                                                       |   3%
| For example, we may only be interested in the first 20 elements of a vector, or only the
| elements that are not NA, or only those that are positive or correspond to a specific variable
| of interest. By the end of this lesson, you'll know how to handle each of these scenarios.

...

たとえば、vector(ベクトル)の最初の20の成分が必要だとか、NA ではないものだけ必要だとか、値が正のものだけ抽出したいとか、特別な変数だけ取り出したいということです。この課の中で、皆さんは、このような場合をどのように扱うか学びます。

  |=====                                                                                    |   5%
| I've created for you a vector called x that contains a random ordering of 20 numbers (from a
| standard normal distribution) and 20 NAs. Type x now to see what it looks like.

標準正規分布 (standard normal distribution) をしている20個の数と、20個の NA が無作為に並んだベクトル x を用意しました。x と入力して、見てみましょう。

x をどのように作成するかわかりますか。一つの方法を、この課の最後に例示します。すでに、前の課で学びましたから、自分で考えてみるのもよいですね。

> x
 [1]           NA           NA  0.672175862           NA  0.008344719           NA           NA
 [8]  1.456607612 -0.407798133 -1.356906989 -0.117210236 -0.163337577           NA -1.089090700
[15] -0.780951863           NA           NA -0.236279138           NA           NA  1.442420911
[22]           NA  0.739574674           NA -0.512273194           NA  1.719594992           NA
[29]           NA           NA           NA  0.352762824           NA -0.775962958           NA
[36] -1.636275451  0.402442715  1.424962109 -0.525836783           NA

| That's a job well done!

よくできました。

  |=======                                                                                  |   8%
| The way you tell R that you want to select some particular elements (i.e. a 'subset') from a
| vector is by placing an 'index vector' in square brackets immediately following the name of the
| vector.

...

vector(ベクトル)の一部と取り出すには、そのベクトルのあとの square brackets([], 学校教育では大括弧と読んでいます)の中に、‘index vector’(インデックス(指数)からなるベクトル)を書き舞うs。

  |=========                                                                                |  10%
| For a simple example, try x[1:10] to view the first ten elements of x.

まず、簡単な例として、x[1:10] を試し、x の最初の10個の要素を見てみましょ。

> x[1:10]
 [1]           NA           NA  0.672175862           NA  0.008344719           NA           NA
 [8]  1.456607612 -0.407798133 -1.356906989

| All that hard work is paying off!

努力は報われます。

  |===========                                                                              |  13%
| Index vectors come in four different flavors -- logical vectors, vectors of positive integers,
| vectors of negative integers, and vectors of character strings -- each of which we'll cover in
| this lesson.

...

インデックスベクトルには4種類あります。論理ベクトル、正の整数からなるベクトル、負の整数からなるベクトル、そして、文字列です。この課でこれらを学びましょう。

  |==============                                                                           |  15%
| Let's start by indexing with logical vectors. One common scenario when working with real-world
| data is that we want to extract all elements of a vector that are not NA (i.e. missing data).
| Recall that is.na(x) yields a vector of logical values the same length as x, with TRUEs
| corresponding to NA values in x and FALSEs corresponding to non-NA values in x.

...

まず、論理ベクトルからなるインデックスベクトルからはじめましょう。実社会のデータを扱っていると、NA(欠損値)以外の値を抽出したいことが起こります。is.na(x) は、x と同じ長さの、論理ベクトルで、NA に対応するところは、TRUE で、NA ではないところは、FALSE になっています。

  |================                                                                         |  18%
| What do you think x[is.na(x)] will give you?

1: A vector of length 0
2: A vector of TRUEs and FALSEs
3: A vector of all NAs
4: A vector with no NAs

x[is.na(x)] はどのようなベクトルとなるでしょうか。1-4 の中から選んでください。1: 長さ 0 のベクトル、2: TRUE と FALSE がいくつか並んだベクトル、3: すべて NA からなるベクトル、4: NA が一つもないベクトル。

Selection: 3

| Excellent job!

素晴らしい。

  |==================                                                                       |  21%
| Prove it to yourself by typing x[is.na(x)].

x[is.na(x)] で、あなたの答えが正しいことを確認してください。

> x[is.na(x)]
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

| Excellent job!

素晴らしい。

  |=====================                                                                    |  23%
| Recall that `!` gives us the negation of a logical expression, so !is.na(x) can be read as 'is
| not NA'. Therefore, if we want to create a vector called y that contains all of the non-NA
| values from x, we can use y <- x[!is.na(x)]. Give it a try.

!は、論理表現の否定でした。ですから、!is.na(x) は、「NA ではない」と読むことができます。NA ではない値ばかりのベクトル y を作成してみましょう。それには、y <- x[!is.na(x)] とします。試してみてください。

> y <- x[!is.na(x)]

| You got it right!

その通りです。

  |=======================                                                                  |  26%
| Print y to the console.

y を(コンピュータ)画面(concole モニター)に表示してください。

> y
 [1]  0.672175862  0.008344719  1.456607612 -0.407798133 -1.356906989 -0.117210236 -0.163337577
 [8] -1.089090700 -0.780951863 -0.236279138  1.442420911  0.739574674 -0.512273194  1.719594992
[15]  0.352762824 -0.775962958 -1.636275451  0.402442715  1.424962109 -0.525836783

| Excellent job!

素晴らしい。

  |=========================                                                                |  28%
| Now that we've isolated the non-missing values of x and put them in y, we can subset y as we
| please.

...

x の 欠損値以外を、分離して、y というベクトルとして抽出することができました。

  |===========================                                                              |  31%
| Recall that the expression y > 0 will give us a vector of logical values the same length as y,
| with TRUEs corresponding to values of y that are greater than zero and FALSEs corresponding to
| values of y that are less than or equal to zero. What do you think y[y > 0] will give you?

1: A vector of all the positive elements of y
2: A vector of all NAs
3: A vector of TRUEs and FALSEs
4: A vector of all the negative elements of y
5: A vector of length 0

y>0 は、y の値が正(0より大きい)ところはTRUE、値が負か0のところ(0以下)は FALSE となる、yと同じ長さのベクトルを与えます。では、y[y > 0] はどのようなベクトルになると思いますか。1-5 の中から選んでください。1: y のうち正の値のみからなるベクトル、2: すべて NA からなるベクトル、3: いくつかの TRUE と FALSE からなるベクトル、4: y のうち負の値のみからなるベクトル、5: 長さが0のベクトル。

Selection: 1

| You are quite good my friend!

いい具合ですね。

  |==============================                                                           |  33%
| Type y[y > 0] to see that we get all of the positive elements of y, which are also the positive
| elements of our original vector x.

y[y > 0] と入力して、y の値の内、正のもののみからなるベクトル、すなわち、最初の x の正の値のみからなるベクトルとなっていることを確認してください。

> y[y > 0]
[1] 0.672175862 0.008344719 1.456607612 1.442420911 0.739574674 1.719594992 0.352762824 0.402442715
[9] 1.424962109

| All that practice is paying off!

練習のおかげですね。

  |================================                                                         |  36%
| You might wonder why we didn't just start with x[x > 0] to isolate the positive elements of x.
| Try that now to see why.

最初から、なぜ、x[x > 0] としないのかと思われる方もいるかもしれませんね。では、試してみましょう。

> x[x > 0]
 [1]          NA          NA 0.672175862          NA 0.008344719          NA          NA
 [8] 1.456607612          NA          NA          NA          NA          NA 1.442420911
[15]          NA 0.739574674          NA          NA 1.719594992          NA          NA
[22]          NA          NA 0.352762824          NA          NA 0.402442715 1.424962109
[29]          NA

| Nice work!

それで良いですよ。

  |==================================                                                       |  38%
| Since NA is not a value, but rather a placeholder for an unknown quantity, the expression NA > 0
| evaluates to NA. Hence we get a bunch of NAs mixed in with our positive numbers when we do this.

...

NA は値ではなく、その場所は、不明だと示すものなので、NA > 0 の値は、NA となります。したがって、結果は、正の数とともに、たくさんの NA を得ることとなりました。

  |=====================================                                                    |  41%
| Combining our knowledge of logical operators with our new knowledge of subsetting, we could do
| this -- x[!is.na(x) & x > 0]. Try it out.

ベクトルの一部分を抽出することを学びましたが、論理演算で学んだこととあわせると、x[!is.na(x) & x > 0] でよいことがわかります。試してみてください。

> x[!is.na(x) & x > 0]
[1] 0.672175862 0.008344719 1.456607612 1.442420911 0.739574674 1.719594992 0.352762824 0.402442715
[9] 1.424962109

| You are quite good my friend!

いい具合ですね。

  |=======================================                                                  |  44%
| In this case, we request only values of x that are both non-missing AND greater than zero.

...

今度は、x の値で、欠損値ではなく、かつ、正(0より大きい)ものを抽出することができました。

  |=========================================                                                |  46%
| I've already shown you how to subset just the first ten values of x using x[1:10]. In this case,
| we're providing a vector of positive integers inside of the square brackets, which tells R to
| return only the elements of x numbered 1 through 10.

...

すでに、x[1:10] で、最初の10個の値を抽出する方法を示しました。この場合は、square brackets(かっこ)の中には、正の数だけからなるベクトルが入っています。そして、x の 1 から 10 番目までの値を抽出するように、R に伝えています。

  |===========================================                                              |  49%
| Many programming languages use what's called 'zero-based indexing', which means that the first
| element of a vector is considered element 0. R uses 'one-based indexing', which (you guessed
| it!) means the first element of a vector is considered element 1.

...

プログラミング言語は多くの場合、インデックスは、0 からはじめます。すなわち、0番目を最初としますが、R では、みなさんもわかっているかもしれませんが、1 からはじめます。すなわち、1番目が最初です。

  |==============================================                                           |  51%
| Can you figure out how we'd subset the 3rd, 5th, and 7th elements of x? Hint -- Use the c()
| function to specify the element numbers as a numeric vector.

x の3番目、5番目、7番目を抽出する方法はわかりますか。ヒントは、c() を使って、何番目かを指定する、数ベクトルを使うことです。

> x[c(3,5,7)]
[1] 0.672175862 0.008344719          NA

| Keep up the great work!

その調子です。

  |================================================                                         |  54%
| It's important that when using integer vectors to subset our vector x, we stick with the set of
| indexes {1, 2, ..., 40} since x only has 40 elements. What happens if we ask for the zeroth
| element of x (i.e. x[0])? Give it a try.

_ベクトル x の一部分を取り出すとき、インデックス {1, 2, …, 40} の中から指定することは大切です。x は 40 個の成分からなっているからです。では、0 番目を指定したらどうなるでしょうか。x[0]_を試してみてください。

> x[0]
numeric(0)

| All that practice is paying off!

練習のおかげですね。

  |==================================================                                       |  56%
| As you might expect, we get nothing useful. Unfortunately, R doesn't prevent us from doing this.
| What if we ask for the 3000th element of x? Try it out.

特に有用なものではありませんでした。しかし、R は、このようなことを禁止しているわけでもないこともわかりました。では、x の3000 番目要素はどうなるでしょうか。試してみてください。

> x[3000]
[1] NA

| Keep working like that and you'll get there!

その調子で良いですよ。

  |====================================================                                     |  59%
| Again, nothing useful, but R doesn't prevent us from asking for it. This should be a cautionary
| tale. You should always make sure that what you are asking for is within the bounds of the
| vector you're working with.

...

有効ではありませんが、R は、このようなことを禁止もしません。記憶に留めておくべき事ではあります。つねに、x のインデックスの範囲の値を抽出することを、大切なことです。

  |=======================================================                                  |  62%
| What if we're interested in all elements of x EXCEPT the 2nd and 10th? It would be pretty
| tedious to construct a vector containing all numbers 1 through 40 EXCEPT 2 and 10.

...

では、x の 2番目と 10番目以外を抽出するときはどうしたらよいでしょうか。1 から 40 で、2 と 10 を除いた数からなるベクトルを作成するのは、面倒です。

  |=========================================================                                |  64%
| Luckily, R accepts negative integer indexes. Whereas x[c(2, 10)] gives us ONLY the 2nd and 10th
| elements of x, x[c(-2, -10)] gives us all elements of x EXCEPT for the 2nd and 10 elements.  Try
| x[c(-2, -10)] now to see this.

R では、負の数を、インデックスとして使うことができます。x[c(2, 10)] は、x の2番目と、10番目だけを抽出するのでしたが、x[c(-2, -10)] とすると、2番目と、10番目以外を抽出することができます。試してください。

> x[c(-2, -10)]
 [1]           NA  0.672175862           NA  0.008344719           NA           NA  1.456607612
 [8] -0.407798133 -0.117210236 -0.163337577           NA -1.089090700 -0.780951863           NA
[15]           NA -0.236279138           NA           NA  1.442420911           NA  0.739574674
[22]           NA -0.512273194           NA  1.719594992           NA           NA           NA
[29]           NA  0.352762824           NA -0.775962958           NA -1.636275451  0.402442715
[36]  1.424962109 -0.525836783           NA

| All that hard work is paying off!

努力は報われます。

  |===========================================================                              |  67%
| A shorthand way of specifying multiple negative numbers is to put the negative sign out in front
| of the vector of positive numbers. Type x[-c(2, 10)] to get the exact same result.

もう少し簡単な方法として、正の数からなるベクトルの先頭に、- を付けることも可能です。x[-c(2, 10)] でも同じ結果を得ます。

> x[-c(2, 10)]
 [1]           NA  0.672175862           NA  0.008344719           NA           NA  1.456607612
 [8] -0.407798133 -0.117210236 -0.163337577           NA -1.089090700 -0.780951863           NA
[15]           NA -0.236279138           NA           NA  1.442420911           NA  0.739574674
[22]           NA -0.512273194           NA  1.719594992           NA           NA           NA
[29]           NA  0.352762824           NA -0.775962958           NA -1.636275451  0.402442715
[36]  1.424962109 -0.525836783           NA

| You got it right!

その通りです。

  |==============================================================                           |  69%
| So far, we've covered three types of index vectors -- logical, positive integer, and negative
| integer. The only remaining type requires us to introduce the concept of 'named' elements.

...

これまで、インデックスを指定する三つの方法を学びました。論理値、正の整数、負の整数です。最後に、名前による抽出について学びましょう。

  |================================================================                         |  72%
| Create a numeric vector with three named elements using vect <- c(foo = 11, bar = 2, norf = NA).

vect <- c(foo = 11, bar = 2, norf = NA) として、名前のついた、三つの要素からなるベクトルを作成しましょう。

> vect <- c(foo = 11, bar = 2, norf = NA)

| All that practice is paying off!

練習のおかげですね。

  |==================================================================                       |  74%
| When we print vect to the console, you'll see that each element has a name. Try it out.

Console(コンピュータ画面)に出力すると、それぞれに名前が付いていることがわかります。試してくさい。

> vect
 foo  bar norf 
  11    2   NA 

| That's the answer I was looking for.

2.11. vapply and tapply

前の課で学んだ、lapply と sapply の親戚である、vapply tapply について学びます。
和訳や解説は、課を選択するところから始めます。よくわからない方は、1.2 または、1.1 の最初の部分を復習してください。

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files        3: Sequences of Numbers    
 4: Vectors                    5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                      9: Functions               
10: lapply and sapply         11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times           15: Base Graphics           


Selection: 11

11: vapply and tapply(第11課 vapply and tapply)を選択しました。

  |                                                                                         |   0%

| In the last lesson, you learned about the two most fundamental members of R's *apply family of
| functions: lapply() and sapply(). Both take a list as input, apply a function to each element of
| the list, then combine and return the result. lapply() always returns a list, whereas sapply()
| attempts to simplify the result.

...

_前の課では、R の *apply 系列の二つの基本的な関数である、lapply() と sapply() について学びました。どちらも、list を入力し、リストのそれぞれの要素に関数を適用します。そして、結果を連結させて出力します。lapply() は常に、リストを返しますが、sapply() は、結果を簡単にすることを試みます。_

  |====                                                                                     |   4%
| In this lesson, you'll learn how to use vapply() and tapply(), each of which serves a very
| specific purpose within the Split-Apply-Combine methodology. For consistency, we'll use the same
| dataset we used in the 'lapply and sapply' lesson.

...

この課では、vapply() と tapply() について学びます。どちらも、Split-Apply-Combine(分解して、適用して、連結させる)手法を実行します。続きであることを意識して、lapply() と sapply() を学んだときと同じデータセットを使います。

  |=======                                                                                  |   8%
| The Flags dataset from the UCI Machine Learning Repository contains details of various nations
| and their flags. More information may be found here:
| http://archive.ics.uci.edu/ml/datasets/Flags

...

Flags データセットは、UCI Machine Learning Repository(カリフォルニア大学アーバイン校の機械学習のためのデータ庫)からのもので、http://archive.ics.uci.edu/ml/datasets/Flags に詳しい情報があります。

  |===========                                                                              |  12%
| I've stored the data in a variable called flags. If it's been a while since you completed the
| 'lapply and sapply' lesson, you may want to reacquaint yourself with the data by using functions
| like dim(), head(), str(), and summary() when you return to the prompt (>). You can also type
| viewinfo() at the prompt to bring up some documentation for the dataset. Let's get started!

...

flag という名前の変数にデータを割り当ててあります。‘lapply’ と ‘sapply’ を学んでしばらくたっているのであれば、(いったん、swirl から抜けて)Prompt (>) で、dim(), head(), str(), や summary() を試してください。また、viewinfo() として、このデータセットに関する文書をみてみるのも良いでしょう。はじめましょう。

viewinfo() の情報は、flag.names.txt とというファイルにあり、その内容は、この課のノートの直前、前の課の最後に掲載してあります。また、この課の最後に、dim(), head(), str(), や summary() の結果の、Console Log を載せましたので、参考にしてください。

  |==============                                                                           |  16%
| As you saw in the last lesson, the unique() function returns a vector of the unique values
| contained in the object passed to it. Therefore, sapply(flags, unique) returns a list containing
| one vector of unique values for each column of the flags dataset. Try it again now.

前の課で見たように、unique() 関数は、そのオブジェクトに含まれる相異なる値をベクトルにして返します。そこで、sapply(flags, unique) は、flag データセットの各列の相異なる値をベクトルにして、リストにまとめて返します。試してください。

> sapply(flags, unique)
$name
  [1] Afghanistan              Albania                  Algeria                 
  [4] American-Samoa           Andorra                  Angola                  
  [7] Anguilla                 Antigua-Barbuda          Argentina               
 [10] Argentine                Australia                Austria                 
 [13] Bahamas                  Bahrain                  Bangladesh              
 [16] Barbados                 Belgium                  Belize                  
 [19] Benin                    Bermuda                  Bhutan                  
 [22] Bolivia                  Botswana                 Brazil                  
 [25] British-Virgin-Isles     Brunei                   Bulgaria                
 [28] Burkina                  Burma                    Burundi                 
 [31] Cameroon                 Canada                   Cape-Verde-Islands      
 [34] Cayman-Islands           Central-African-Republic Chad                    
 [37] Chile                    China                    Colombia                
 [40] Comorro-Islands          Congo                    Cook-Islands            
 [43] Costa-Rica               Cuba                     Cyprus                  
 [46] Czechoslovakia           Denmark                  Djibouti                
 [49] Dominica                 Dominican-Republic       Ecuador                 
 [52] Egypt                    El-Salvador              Equatorial-Guinea       
 [55] Ethiopia                 Faeroes                  Falklands-Malvinas      
 [58] Fiji                     Finland                  France                  
 [61] French-Guiana            French-Polynesia         Gabon                   
 [64] Gambia                   Germany-DDR              Germany-FRG             
 [67] Ghana                    Gibraltar                Greece                  
 [70] Greenland                Grenada                  Guam                    
 [73] Guatemala                Guinea                   Guinea-Bissau           
 [76] Guyana                   Haiti                    Honduras                
 [79] Hong-Kong                Hungary                  Iceland                 
 [82] India                    Indonesia                Iran                    
 [85] Iraq                     Ireland                  Israel                  
 [88] Italy                    Ivory-Coast              Jamaica                 
 [91] Japan                    Jordan                   Kampuchea               
 [94] Kenya                    Kiribati                 Kuwait                  
 [97] Laos                     Lebanon                  Lesotho                 
[100] Liberia                  Libya                    Liechtenstein           
[103] Luxembourg               Malagasy                 Malawi                  
[106] Malaysia                 Maldive-Islands          Mali                    
[109] Malta                    Marianas                 Mauritania              
[112] Mauritius                Mexico                   Micronesia              
[115] Monaco                   Mongolia                 Montserrat              
[118] Morocco                  Mozambique               Nauru                   
[121] Nepal                    Netherlands              Netherlands-Antilles    
[124] New-Zealand              Nicaragua                Niger                   
[127] Nigeria                  Niue                     North-Korea             
[130] North-Yemen              Norway                   Oman                    
[133] Pakistan                 Panama                   Papua-New-Guinea        
[136] Parguay                  Peru                     Philippines             
[139] Poland                   Portugal                 Puerto-Rico             
[142] Qatar                    Romania                  Rwanda                  
[145] San-Marino               Sao-Tome                 Saudi-Arabia            
[148] Senegal                  Seychelles               Sierra-Leone            
[151] Singapore                Soloman-Islands          Somalia                 
[154] South-Africa             South-Korea              South-Yemen             
[157] Spain                    Sri-Lanka                St-Helena               
[160] St-Kitts-Nevis           St-Lucia                 St-Vincent              
[163] Sudan                    Surinam                  Swaziland               
[166] Sweden                   Switzerland              Syria                   
[169] Taiwan                   Tanzania                 Thailand                
[172] Togo                     Tonga                    Trinidad-Tobago         
[175] Tunisia                  Turkey                   Turks-Cocos-Islands     
[178] Tuvalu                   UAE                      Uganda                  
[181] UK                       Uruguay                  US-Virgin-Isles         
[184] USA                      USSR                     Vanuatu                 
[187] Vatican-City             Venezuela                Vietnam                 
[190] Western-Samoa            Yugoslavia               Zaire                   
[193] Zambia                   Zimbabwe                
194 Levels: Afghanistan Albania Algeria American-Samoa Andorra Angola Anguilla ... Zimbabwe

$landmass
[1] 5 3 4 6 1 2

$zone
[1] 1 3 2 4

$area
  [1]   648    29  2388     0  1247  2777  7690    84    19     1   143    31    23   113    47
 [16]  1099   600  8512     6   111   274   678    28   474  9976     4   623  1284   757  9561
 [31]  1139     2   342    51   115     9   128    43    22    49   284  1001    21  1222    12
 [46]    18   337   547    91   268    10   108   249   239   132  2176   109   246    36   215
 [61]   112    93   103  3268  1904  1648   435    70   301   323    11   372    98   181   583
 [76]   236    30  1760     3   587   118   333  1240  1031  1973  1566   447   783   140    41
 [91]  1267   925   121   195   324   212   804    76   463   407  1285   300   313    92   237
[106]    26  2150   196    72   637  1221    99   288   505    66  2506    63    17   450   185
[121]   945   514    57     5   164   781   245   178  9363 22402    15   912   256   905   753
[136]   391

$population
 [1]   16    3   20    0    7   28   15    8   90   10    1    6  119    9   35    4   24    2   11
[20] 1008    5   47   31   54   17   61   14  684  157   39   57  118   13   77   12   56   18   84
[39]   48   36   22   29   38   49   45  231  274   60

$language
 [1] 10  6  8  1  2  4  3  5  7  9

$religion
[1] 2 6 1 0 5 3 4 7

$bars
[1] 0 2 3 1 5

$stripes
 [1]  3  0  2  1  5  9 11 14  4  6 13  7

$colours
[1] 5 3 2 8 6 4 7 1

$red
[1] 1 0

$green
[1] 1 0

$blue
[1] 0 1

$gold
[1] 1 0

$white
[1] 1 0

$black
[1] 1 0

$orange
[1] 0 1

$mainhue
[1] green  red    blue   gold   white  orange black  brown 
Levels: black blue brown gold green orange red white

$circles
[1] 0 1 4 2

$crosses
[1] 0 1 2

$saltires
[1] 0 1

$quarters
[1] 0 1 4

$sunstars
 [1]  1  0  6 22 14  3  4  5 15 10  7  2  9 50

$crescent
[1] 0 1

$triangle
[1] 0 1

$icon
[1] 1 0

$animate
[1] 0 1

$text
[1] 0 1

$topleft
[1] black  red    green  blue   white  orange gold  
Levels: black blue gold green orange red white

$botright
[1] green  red    white  black  blue   gold   orange brown 
Levels: black blue brown gold green orange red white


| Keep working like that and you'll get there!

その調子で良いですよ。

  |==================                                                                       |  20%
| What if you had forgotten how unique() works and mistakenly thought it returns the *number* of
| unique values contained in the object passed to it? Then you might have incorrectly expected
| sapply(flags, unique) to return a numeric vector, since each element of the list returned would
| contain a single number and sapply() could then simplify the result to a vector.

...

たとえば、unique() がどのように作用するか忘れていて、渡したオブジェクトに含まれる相異なるもののを返すと考えたとしましょう。すると、各値は、一つの数値ですから、sapply(flags, unique) は、簡単にして、ベクトルを返すと考えることになります。

  |=====================                                                                    |  24%
| When working interactively (at the prompt), this is not much of a problem, since you see the
| result immediately and will quickly recognize your mistake. However, when working
| non-interactively (e.g. writing your own functions), a misunderstanding may go undetected and
| cause incorrect results later on. Therefore, you may wish to be more careful and that's where
| vapply() is useful.

...

Prompt に対話型で関数を入力し、作業を進めるときは、すぐに答えを見て、間違いに気づくので、あまり問題になりません。しかし、対話型ではなく、プログラムを書いて、作業を進めるときには、間違いに気づかずに過ぎてしまうかもしれません。従って、もっと、注意をする必要があります。このようなときに、vapply() は有効です。

  |=========================                                                                |  28%
| Whereas sapply() tries to 'guess' the correct format of the result, vapply() allows you to
| specify it explicitly. If the result doesn't match the format you specify, vapply() will throw
| an error, causing the operation to stop. This can prevent significant problems in your code that
| might be caused by getting unexpected return values from sapply().

...

sapply() では、結果の形式を推測して実行しますが、vapply() は明示的に、実行します。形式が適合しなければ、vapply() はエラーメッセージを出し、作業は停止します。これによって、sapply() が予想にない値を返し、重大な問題を生じることを避けることができます。

  |============================                                                             |  32%
| Try vapply(flags, unique, numeric(1)), which says that you expect each element of the result to
| be a numeric vector of length 1. Since this is NOT actually the case, YOU WILL GET AN ERROR.
| Once you get the error, type ok() to continue to the next question.

vapply(flags, unique, numeric(1)) と入力しましょう。これは、結果が、長さが1の数値ベクトルを返す事を期待していることを意味します。これは、実際には、正しくないので、エラーをひきおこします。エラーになったら、ok() と入力すると、次の問題を続けることができます。

> vapply(flags, unique, numeric(1))
Error in vapply(flags, unique, numeric(1)) : values must be length 1,
 but FUN(X[[1]]) result is length 194
[日本語の場合のエラーメッセージ] vapply(flags, unique, numeric(1)) でエラー: 
   値の長さは 1 でなければなりません、 
 しかし、FUN(X[[1]]) の結果の長さが 194 です 

> ok()

| You are doing so well!

とてもうまくいっていますよ。

  |================================                                                         |  36%
| Recall from the previous lesson that sapply(flags, class) will return a character vector
| containing the class of each column in the dataset. Try that again now to see the result.

前の課で学んだように、sapply(flags, class) は、データセットの各列の種類(class)を表す文字列ベクトルを返すのでした。もう一度試してください。

> sapply(flags, class)
      name   landmass       zone       area population   language   religion       bars    stripes 
  "factor"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer" 
   colours        red      green       blue       gold      white      black     orange    mainhue 
 "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"   "factor" 
   circles    crosses   saltires   quarters   sunstars   crescent   triangle       icon    animate 
 "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer" 
      text    topleft   botright 
 "integer"   "factor"   "factor" 

| All that hard work is paying off!

努力は報われます。

  |====================================                                                     |  40%
| If we wish to be explicit about the format of the result we expect, we can use vapply(flags,
| class, character(1)). The 'character(1)' argument tells R that we expect the class function to
| return a character vector of length 1 when applied to EACH column of the flags dataset. Try it
| now.

出力としてなにを期待しているかを明示的にするには、vapply(flags, class, character(1)) とします。引数の ‘character(1)’ は、R に、flag データセットの各列に、class 関数を適用した結果として、長さ 1 の文字列ベクトルを返すことを期待していると伝えています。試してください。

> vapply(flags, class, character(1))
      name   landmass       zone       area population   language   religion       bars    stripes 
  "factor"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer" 
   colours        red      green       blue       gold      white      black     orange    mainhue 
 "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"   "factor" 
   circles    crosses   saltires   quarters   sunstars   crescent   triangle       icon    animate 
 "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer" 
      text    topleft   botright 
 "integer"   "factor"   "factor" 

| You nailed it! Good job!

見事です。

  |=======================================                                                  |  44%
| Note that since our expectation was correct (i.e. character(1)), the vapply() result is
| identical to the sapply() result -- a character vector of column classes.

...

確かに、予測は正しく(character(1))、vapply() は、sapply() と全く同じ結果、すなわち、各列の種類(class)を返しました。

  |===========================================                                              |  48%
| You might think of vapply() as being 'safer' than sapply(), since it requires you to specify the
| format of the output in advance, instead of just allowing R to 'guess' what you wanted. In
| addition, vapply() may perform faster than sapply() for large datasets. However, when doing data
| analysis interactively (at the prompt), sapply() saves you some typing and will often be good
| enough.

...

vapply() は、sapply() のように、R に、結果を推測させるのではなく、具体的な形式を前もって指定するので、より安全だと思うでしょう。さらに、vapply() は sapply() と比較して、特に大きなデータセットに対しては、実行速度が速いのです。しかし、Prompt に対して、対話型で、作業を進めるときは、sapply() は、入力を減らしますし、多くの場合、それで十分でもあります。

  |==============================================                                           |  52%
| As a data analyst, you'll often wish to split your data up into groups based on the value of
| some variable, then apply a function to the members of each group. The next function we'll look
| at, tapply(), does exactly that.

...

データ分析をしようとすると、まず、データを、ある変数の値によって、いくつかに分割し、分割したグループのそれぞれに、関数を適用したいことが起こります。次の関数 tapply() は、これから見ていくように、まさに、その作業をします。

  |==================================================                                       |  56%
| Use ?tapply to pull up the documentation.

?tapply として、説明の文書を表示してください。

> ?tapply

| You are amazing!

驚きです。

  |=====================================================                                    |  60%
| The 'landmass' variable in our dataset takes on integer values between 1 and 6, each of which
| represents a different part of the world. Use table(flags$landmass) to see how many
| flags/countries fall into each group.

データセットの ‘landmass’ 変数は、1 から 6 の整数値をとり、世界の異なる地域を表します。table(flags$landmass) として、それぞれのグループにいくつの、flags/countries(国旗/国)があるか、見てみましょう。

> table(flags$landmass)

 1  2  3  4  5  6 
31 17 35 52 39 20 

| That's a job well done!

よくできました。

  |=========================================================                                |  64%
| The 'animate' variable in our dataset takes the value 1 if a country's flag contains an animate
| image (e.g. an eagle, a tree, a human hand) and 0 otherwise. Use table(flags$animate) to see how
| many flags contain an animate image.

データセットの ‘animate’ 変数は、その国の国旗が animate image(たとえば、わしや、木や、人間の手など)を含んでいれば 1、そうでなければ、0 となっています。table(flags$animate) として、いくつの国旗が、animate image を含んでいるか見てください。

> table(flags$animate)

  0   1 
155  39 

| That's a job well done!

よくできました。

  |=============================================================                            |  68%
| This tells us that 39 flags contain an animate object (animate = 1) and 155 do not (animate =
| 0).

...

39の旗は、animate object (animate = 1) を含んでおり、155 (animate = 0) は含んでいないことがわかりました。

  |================================================================                         |  72%
| If you take the arithmetic mean of a bunch of 0s and 1s, you get the proportion of 1s. Use
| tapply(flags$animate, flags$landmass, mean) to apply the mean function to the 'animate' variable
| separately for each of the six landmass groups, thus giving us the proportion of flags
| containing an animate image WITHIN each landmass group.

0 と 1 の値の算術平均をとれば、1 の割合がわかります。tapply(flags\(animate, flags\)landmass, mean) として、6個の、landmass グループそれぞれについて別々に、‘animate’ 変数の mean(平均)をとってください。

> tapply(flags$animate, flags$landmass, mean)
        1         2         3         4         5         6 
0.4193548 0.1764706 0.1142857 0.1346154 0.1538462 0.3000000 

| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |====================================================================                     |  76%
| The first landmass group (landmass = 1) corresponds to North America and contains the highest
| proportion of flags with an animate image (0.4194).

...

最初の landmass グループ (landmass = 1) は、北アメリカで、一番高い、animate image の割合 (0.4194) になっています。

  |=======================================================================                  |  80%
| Similarly, we can look at a summary of population values (in round millions) for countries with
| and without the color red on their flag with tapply(flags$population, flags$red, summary).

同様に、tapply(flags\(population, flags\)red, summary) で、旗に赤を含むか含まないかに分けて、人口を(100万人に四捨五入して)summary (要約)関数を使って求めてみましょう。

> tapply(flags$population, flags$red, summary)
$`0`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    0.00    3.00   27.63    9.00  684.00 

$`1`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0     0.0     4.0    22.1    15.0  1008.0 


| That's correct!

正しいです。

  |===========================================================================              |  84%
| What is the median population (in millions) for countries *without* the color red on their flag?

1: 4.0
2: 3.0
3: 0.0
4: 22.1
5: 27.6
6: 9.0

赤を国旗に含まない国の中での人口(100万人に四捨五入)のメディアン(中央値)は何ですか。次から選んでください。1: 4.0, 2: 3.0, 3: 0.0, 4:22.1, 5: 27.6, 6:9.0.

Selection: 2

| You got it!

その通りです。

  |==============================================================================           |  88%
| Lastly, use the same approach to look at a summary of population values for each of the six
| landmasses.

最後に、同じ方法で、それぞれの 6種類の landmasses グループごとに、人口についての要約(summary)を求めてください。

> tapply(flags$population, flags$landmass, summary)
$`1`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    0.00    0.00   12.29    4.50  231.00 

$`2`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    1.00    6.00   15.71   15.00  119.00 

$`3`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    0.00    8.00   13.86   16.00   61.00 

$`4`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   1.000   5.000   8.788   9.750  56.000 

$`5`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    2.00   10.00   69.18   39.00 1008.00 

$`6`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    0.00    0.00   11.30    1.25  157.00 


| That's the answer I was looking for.

そうです。それが、期待したものです。

  |==================================================================================       |  92%
| What is the maximum population (in millions) for the fourth landmass group (Africa)?

1: 1010.0
2: 119.0
3: 5.00
4: 56.00
5: 157.00

4番目の landmass グループ(アフリカ)の最大の人口(100万人に四捨五入)は何ですか。次から選んでください。1: 1010.0, 2: 119.0, 3: 5.00, 4: 56.00, 5: 157.00

Selection: 4

| All that hard work is paying off!

努力は報われます。

  |=====================================================================================    |  96%
| In this lesson, you learned how to use vapply() as a safer alternative to sapply(), which is
| most helpful when writing your own functions. You also learned how to use tapply() to split your
| data into groups based on the value of some variable, then apply a function to each group. These
| functions will come in handy on your quest to become a better data analyst.

...

この課では、vapply() は、sapply() の代わりとして使えるより安全な命令で、特に、プログラムを書くときに有効であることを学びました。また、tappy() 関数によって、ある変数の値によってグループに分けたデータについて、それぞれの、グループに関数を適用することに用いることを学びました。これらは、データ分析をするときに、とても、便利な関数です。

  |=========================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

Coursera.org の終了認定が欲しいですか。

1 を選んでください。何度か説明しましたから、省略します。

| Great job!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 0

やりましたね。

課の最後までたどり着きました。メニューに戻るには、

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: `swirl` コースのリストへ.

選択:0 (SL が入力)

| Leaving swirl now. Type swirl() to resume.

swirl を終了します。再開するときは、swirl() と入力してください。

>
  |====================================================================                     |  77%
| We can also get the names of vect by passing vect as an argument to the names() function. Give
| that a try.

names() 関数を使い、vect を argument(引数)とすることで、vect の 名前を得ることもできます。試してください。

> names(vect)
[1] "foo"  "bar"  "norf"

| Nice work!

それで良いですよ。

  |=======================================================================                  |  79%
| Alternatively, we can create an unnamed vector vect2 with c(11, 2, NA). Do that now.

代わりに、c(11, 2, NA) として、名前なしのベクトル、vect2 を作成してみます。試してください。

> vect2 <- c(11, 2, NA)

| That's the answer I was looking for.

そうです。それが、期待したものです。

  |=========================================================================                |  82%
| Then, we can add the `names` attribute to vect2 after the fact with names(vect2) <- c("foo",
| "bar", "norf"). Go ahead.

今度は、vect2 に 属性 names を、names(vect2) <- c(“foo”,“bar”, “norf”) で付加することができます。試してください。

> names(vect2) <- c("foo", "bar", "norf")

| You are really on a roll!

どんどん学んでいますね。

  |===========================================================================              |  85%
| Now, let's check that vect and vect2 are the same by passing them as arguments to the
| identical() function.

では、vect と vect2 を arguments(引数)として、関数 identical() を使って、これらが全く等しいことを確かめてください。

> identical(vect, vect2)
[1] TRUE

| Great job!

やりましたね。

  |==============================================================================           |  87%
| Indeed, vect and vect2 are identical named vectors.

...

まさに、vect と vect2 は全く同じベクトルです。

  |================================================================================         |  90%
| Now, back to the matter of subsetting a vector by named elements. Which of the following
| commands do you think would give us the second element of vect?

1: vect[bar]
2: vect["2"]
3: vect["bar"]

ベクトルの一部を抽出することに戻りました。vect の2番目の成分を取り出す命令は、どれだと思いますか。

Selection: 2

| Excellent job!

素晴らしい。

  |==================================================================================       |  92%
| Now, try it out.

では、試してください。

> vect["bar"]
bar 
  2 

| Nice work!

それで良いですよ。

  |====================================================================================     |  95%
| Likewise, we can specify a vector of names with vect[c("foo", "bar")]. Try it out.

同じように、vect[c(“foo”, “bar”)] のように使うこともできます。試してください。

> vect[c("foo", "bar")]
foo bar 
 11   2 

| Keep working like that and you'll get there!

その調子で良いですよ。

  |=======================================================================================  |  97%
| Now you know all four methods of subsetting data from vectors. Different approaches are best in
| different scenarios and when in doubt, try it out!

...

これで、ベクトルの一部を取り出す、四つの方法を学びました。それぞれの異なる状況で、最善となる、異なる方法がありますよ。信じられなければ、試してみてください。

  |=========================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: No

Selection: 2

Coursera.org の終了認定が欲しいですか。

_2 を選んでください。何度か説明しましたから、省略します。

| All that practice is paying off!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 0

練習のおかげですね。

課の最後までたどり着きました。メニューに戻るには、

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: `swirl` コースのリストへ.

選択:0 (SL が入力)

| Leaving swirl now. Type swirl() to resume.

swirl を終了します。再開するときは、swirl() と入力してください。

> 

x のようなベクトルの作成の仕方

I've created for you a vector called x that contains a random ordering of 20 numbers (from a standard normal distribution) and 20 NAs.
sample(c(rnorm(20), rep(NA, 20)), 40)
##  [1]          NA          NA -1.24430456  0.37683183          NA          NA
##  [7]  0.36507697  2.00220983          NA -0.91842341          NA          NA
## [13]  0.69165352          NA -1.01579568          NA          NA -0.92191400
## [19]  2.58328343  0.78966941  1.51246771          NA          NA -0.19314079
## [25] -0.45687956  1.51328796  1.34525827          NA  1.80513215          NA
## [31]          NA -0.79840477          NA  1.73824604          NA  0.06402089
## [37]          NA          NA  0.38357955          NA

2.7. Matrices and Data Frames

vectors(ベクトル)の拡張として、matrices(行列)と、R で最も基本的な、データ形式である、Data Frames(データフレーム)について学びます。Data Frames(データフレーム)についてしっかり理解してください。
swirl() ではじめるところから、解説します。最初に、swirl を Install したり、library(swirl) を実行する部分は、省略します。よくわからない方は、1.2 または、1.1 の最初の部分を復習してください。

> swirl()

次のステップも同じですから、和訳や解説は省略します。

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you did then.
| If you are new, call yourself something unique.

What shall I call you? SL

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files        3: Sequences of Numbers    
 4: Vectors                    5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                      9: Functions               
10: lapply and sapply         11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times           15: Base Graphics           


Selection: 7

7: Matrices and Data Frames(第7課 行列とデータフレーム)を選択しました。

  |                                                                                         |   0%

| In this lesson, we'll cover matrices and data frames. Both represent 'rectangular' data types,
| meaning that they are used to store tabular data, with rows and columns.

...

この課では、行列(matrices)と データフレーム(data frame)について学びます。どちらも、長方形(rectangular)をしたデータ型で、行と列からなる表を保存することができます。

  |==                                                                                       |   3%
| The main difference, as you'll see, is that matrices can only contain a single class of data,
| while data frames can consist of many different classes of data.

...

なにが異なるかは、いずれわかりますが、行列は、一種類(クラス)のデータしか持てませんが、データフレームは、いくつもの種類(クラス)のデータを含むことができます。

  |=====                                                                                    |   6%
| Let's create a vector containing the numbers 1 through 20 using the `:` operator. Store the
| result in a variable called my_vector.

1から20からなるベクトルを : で作成し、結果を my_vector という名前に割り当てましょう。

> my_vector <- 1:20

| Keep up the great work!

その調子です。

  |=======                                                                                  |   8%
| View the contents of the vector you just created.

いま作成したベクトルの中身を見てみましょう。

> my_vector
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

| Your dedication is inspiring!

感動ものです。

  |==========                                                                               |  11%
| The dim() function tells us the 'dimensions' of an object. What happens if we do dim(my_vector)?
| Give it a try.

dim() 関数は、オブジェクトの ‘dimensions’ (次元)を教えてくれます。dim(my_vector) とするとどうなりますか。試してください。

> dim(my_vector)
NULL

| Keep up the great work!

その調子です。

  |============                                                                             |  14%
| Clearly, that's not very helpful! Since my_vector is a vector, it doesn't have a `dim` attribute
| (so it's just NULL), but we can find its length using the length() function. Try that now.

あまり助けになりませんね。my_vector は、ベクトルですから、dim という属性を持っていません。(そこで、 NULL ’なし’でした)長さは length() 関数で調べることができます。試してください。

> length(my_vector)
[1] 20

| Excellent job!

素晴らしい。

  |===============                                                                          |  17%
| Ah! That's what we wanted. But, what happens if we give my_vector a `dim` attribute? Let's give
| it a try. Type dim(my_vector) <- c(4, 5).

それが求めるものでした。では、my_vector に dim 属性を付加したらどうなるでしょうか。試してみましょう。dim(my_vector) <- c(4, 5) と入力します。

> dim(my_vector) <- c(4, 5)

| Great job!

やりましたね。

  |=================                                                                        |  19%
| It's okay if that last command seemed a little strange to you. It should! The dim() function
| allows you to get OR set the `dim` attribute for an R object. In this case, we assigned the
| value c(4, 5) to the `dim` attribute of my_vector.

...

いま使った命令はなにか奇妙に感じるかもしれませんね。たしかにそうです。dim() 関数は、R のオブジェクトに、dim(次元)属性を付加します。この場合は、my_vector に、dim(次元)属性を、c(4,5) としました。

  |====================                                                                     |  22%
| Use dim(my_vector) to confirm that we've set the `dim` attribute correctly.

dim(my_vector) を使って、dim(次元)属性を正しく付加したことを確認してください。

> dim(my_vector)
[1] 4 5

| Excellent job!

素晴らしい。

  |======================                                                                   |  25%
| Another way to see this is by calling the attributes() function on my_vector. Try it now.

他の方法としては、attributes() 関数を使って、my_vector の属性を見ることもできます。試してください。

> attributes(my_vector)
$dim
[1] 4 5

| You are quite good my friend!

いい具合ですね。

  |=========================                                                                |  28%
| Just like in math class, when dealing with a 2-dimensional object (think rectangular table), the
| first number is the number of rows and the second is the number of columns. Therefore, we just
| gave my_vector 4 rows and 5 columns.

...

数学のクラスのように、二次元のオブジェクト(長方形の表のようなもの)を扱うときには、最初の数は「行 (row)」、二番目の数は「列 (column)」を表します。従って、my_vector は、4行、5列 としました。

  |===========================                                                              |  31%
| But, wait! That doesn't sound like a vector any more. Well, it's not. Now it's a matrix. View
| the contents of my_vector now to see what it looks like.

ちょっと待ってください。もう、ベクトルらしくないですね。今や行列です。my_vector の中身を表示させて、どのようになっているか見てみましょう。

> my_vector
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    2    6   10   14   18
[3,]    3    7   11   15   19
[4,]    4    8   12   16   20

| That's correct!

正しいです。

  |==============================                                                           |  33%
| Now, let's confirm it's actually a matrix by using the class() function. Type class(my_vector)
| to see what I mean.
> class(my_vector)
[1] "matrix"

| You're the best!

最高です。

  |================================                                                         |  36%
| Sure enough, my_vector is now a matrix. We should store it in a new variable that helps us
| remember what it is. Store the value of my_vector in a new variable called my_matrix.
> my_matrix <- my_vector

| That's a job well done!

よくできました。

  |===================================                                                      |  39%
| The example that we've used so far was meant to illustrate the point that a matrix is simply an
| atomic vector with a dimension attribute. A more direct method of creating the same matrix uses
| the matrix() function.

...

この例から、行列は、単に atomic vector で次元という属性を持つものであることがわかりました。行列を作成する、もっと直接的な方法は、matrix() 関数を使うことです。

  |=====================================                                                    |  42%
| Bring up the help file for the matrix() function now using the `?` function.

? 関数を使って、matrix() 関数の Help を見てください。

> ?matrix

| All that hard work is paying off!

努力は報われます。

  |========================================                                                 |  44%
| Now, look at the documentation for the matrix function and see if you can figure out how to
| create a matrix containing the same numbers (1-20) and dimensions (4 rows, 5 columns) by calling
| the matrix() function. Store the result in a variable called my_matrix2.

行列関数の Help を見て、同じ 1-20 のを成分とする、次元(4行、5列)の行列を matrix() 関数を使って作成し、それに、my_matrix2 という名前をつけてください。

> my_matrix2 <- matrix(1:20, nrow = 4, ncol = 5)

| You nailed it! Good job!

見事です。

  |==========================================                                               |  47%
| Finally, let's confirm that my_matrix and my_matrix2 are actually identical. The identical()
| function will tell us if its first two arguments are the same. Try it out.

最後に、my_matrix と my_matrix2 が確かに同じであることを、identical() 関数で確かめましょう。試してください。

> identical(my_matrix, my_matrix2)
[1] TRUE

| Nice work!

それで良いですよ。

  |============================================                                             |  50%
| Now, imagine that the numbers in our table represent some measurements from a clinical
| experiment, where each row represents one patient and each column represents one variable for
| which measurements were taken.

...

いま、我々の表の値は、なんらかの、医学実験の測定値を表し、行は一人の患者のデータを表すとします。列は、測定変数を表すと考えましょう。

  |===============================================                                          |  53%
| We may want to label the rows, so that we know which numbers belong to each patient in the
| experiment. One way to do this is to add a column to the matrix, which contains the names of all
| four people.

...

行に名前を付けましょう。行は各患者のデータでした。それぞれの患者の名前からなる、新しく列を加えるのが一つの方法です。

  |=================================================                                        |  56%
| Let's start by creating a character vector containing the names of our patients -- Bill, Gina,
| Kelly, and Sean. Remember that double quotes tell R that something is a character string. Store
| the result in a variable called patients.

まず、Bill, Gina, Kelly, と Sean からなる、文字列ベクトルを作成しましょう。文字列であることを R に伝えるには、二重引用符 " で囲むのでしたね。patients という名前を割り当ててください。

> patients <- c("Bill", "Gina", "Kelly", "Sean")

| You got it right!

その通りです。

  |====================================================                                     |  58%
| Now we'll use the cbind() function to 'combine columns'. Don't worry about storing the result in
| a new variable. Just call cbind() with two arguments -- the patients vector and my_matrix.

次に、列を連結する cbind() 関数を使います。patients ベクトルと、my_matrix をcbind() の引数とします。

> cbind(patients, my_matrix)
     patients                       
[1,] "Bill"   "1" "5" "9"  "13" "17"
[2,] "Gina"   "2" "6" "10" "14" "18"
[3,] "Kelly"  "3" "7" "11" "15" "19"
[4,] "Sean"   "4" "8" "12" "16" "20"

| That's the answer I was looking for.

そうです。それが、期待したものです。

  |======================================================                                   |  61%
| Something is fishy about our result! It appears that combining the character vector with our
| matrix of numbers caused everything to be enclosed in double quotes. This means we're left with
| a matrix of character strings, which is no good.

...

結果は、ちょっと気持ちが悪いですね。文字列ベクトルと、連結され、行列の成分の数値がすべて、二重引用符で囲まれました。これは、行列が、文字列からなるものになったと言うことです。それは、良くありませんね。

  |=========================================================                                |  64%
| If you remember back to the beginning of this lesson, I told you that matrices can only contain
| ONE class of data. Therefore, when we tried to combine a character vector with a numeric matrix,
| R was forced to 'coerce' the numbers to characters, hence the double quotes.

...

この課の最初に、行列のデータは一種類(クラス)のみであると言いました。そこで、文字ベクトルを数ベクトルと組み合わせたときに、R は強制的に、数を文字列に変更し、結果として、二重引用符で囲まれました。

  |===========================================================                              |  67%
| This is called 'implicit coercion', because we didn't ask for it. It just happened. But why
| didn't R just convert the names of our patients to numbers? I'll let you ponder that question on
| your own.

...

これは、implicit coercion(暗黙の強制)と言われます。そのように、頼んで変化したからではないからです。では、なぜ、R は、患者の名前を数に変えなかったのでしょうか。それは、皆さんが考えてください。

  |==============================================================                           |  69%
| So, we're still left with the question of how to include the names of our patients in the table
| without destroying the integrity of our numeric data. Try the following -- my_data <-
| data.frame(patients, my_matrix)

結局、表に、数データをそのまま壊すことなく、どうやって患者の名前を加えるかという課題は、解決していません。次を試してください。my_data <- data.frame(patients, my_matrix) _

> my_data <- data.frame(patients, my_matrix)

| Keep working like that and you'll get there!

その調子で良いですよ。

  |================================================================                         |  72%
| Now view the contents of my_data to see what we've come up with.

では、my_data のないようがどうなっているか見てみましょう。

> my_data
  patients X1 X2 X3 X4 X5
1     Bill  1  5  9 13 17
2     Gina  2  6 10 14 18
3    Kelly  3  7 11 15 19
4     Sean  4  8 12 16 20

| Excellent work!

素晴らしい。

  |===================================================================                      |  75%
| It looks like the data.frame() function allowed us to store our character vector of names right
| alongside our matrix of numbers. That's exactly what we were hoping for!

...

どうやら、data.frame() 関数は、文字列ベクトルを、数からなる行列と一緒に保存することができたようです。それが、まさに、希望していた事です。

  |=====================================================================                    |  78%
| Behind the scenes, the data.frame() function takes any number of arguments and returns a single
| object of class `data.frame` that is composed of the original objects.

...

data.frame() 関数は、いくつもの arguments(因数)をまとめて、一つの data.frame(データフレーム)というクラスのオブジェクトにしています。

  |========================================================================                 |  81%
| Let's confirm this by calling the class() function on our newly created data frame.

このことを、いま作成したデータフレームについて、class() 関数で確かめてみましょう。

> class(my_data)
[1] "data.frame"

| Great job!

やりましたね。

  |==========================================================================               |  83%
| It's also possible to assign names to the individual rows and columns of a data frame, which
| presents another possible way of determining which row of values in our table belongs to each
| patient.

...

各行や、各列に名前をつけることができます。それによって、表のどの行の値が、どの患者のものかわかります。

  |=============================================================================            |  86%
| However, since we've already solved that problem, let's solve a different problem by assigning
| names to the columns of our data frame so that we know what type of measurement each column
| represents.

...

この課題はすでに解決しましたから、異なるもの、つまり、データフレームの列に、測定値の名前を付けてみましょう。

  |===============================================================================          |  89%
| Since we have six columns (including patient names), we'll need to first create a vector
| containing one element for each column. Create a character vector called cnames that contains
| the following values (in order) -- "patient", "age", "weight", "bp", "rating", "test".

患者名を含めて全部で6列ありますから、各列に対応するベクトルを作成しましょう。cname という名前で、“patient”, “age”, “weight”, “bp”, “rating”, “test” がこの順で並んでいる文字列ベクトルを作成してください。

> cnames <- c("patient", "age", "weight", "bp", "rating", "test")

| Great job!

やりましたね。

  |==================================================================================       |  92%
| Now, use the colnames() function to set the `colnames` attribute for our data frame. This is
| similar to the way we used the dim() function earlier in this lesson.

では、colnames() 関数を使って、データフレームに、colnames という属性を付加してください。これは、この課の中で、以前、dim() 関数を使ったときと同じです。

> colnames(my_data) <- cnames

| Keep working like that and you'll get there!

その調子で良いですよ。

  |====================================================================================     |  94%
| Let's see if that got the job done. Print the contents of my_data.

では、うまくいっているか、my_data の中身を表示させてください。

> my_data
  patient age weight bp rating test
1    Bill   1      5  9     13   17
2    Gina   2      6 10     14   18
3   Kelly   3      7 11     15   19
4    Sean   4      8 12     16   20

| You got it!

その通りです。

  |=======================================================================================  |  97%
| In this lesson, you learned the basics of working with two very important and common data
| structures -- matrices and data frames. There's much more to learn and we'll be covering more
| advanced topics, particularly with respect to data frames, in future lessons.

...

この課では、二つの標準的なデータ構造について学びました。matrices(行列)と data frame(データフレーム)ですね。特に、データフレームについて、今後、もっと詳しく学びます。

  |=========================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

Coursera.org の終了認定が欲しいですか。

1 を選んでください。何度か説明しましたから、省略します。

| You're the best!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 0

最高です。

課の最後までたどり着きました。メニューに戻るには、

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: `swirl` コースのリストへ.

選択:0 (SL が入力)

| Leaving swirl now. Type swirl() to resume.

swirl を終了します。再開するときは、swirl() と入力してください。

>

2.8. Logic

すでに第4課で少し学んだ論理について、続けて学びます。今後、条件を満たすデータフレームの行を取り出したり、流れの制御をする上で、必要になります。少し、複雑ですが、紙と鉛筆や、RStudio をもう一つ開いて、実験を重ねながら、練習問題に取り組んでください。
swirl() ではじめるところから、解説します。最初に、swirl を Install したり、library(swirl) を実行する部分は、省略します。よくわからない方は、1.2 または、1.1 の最初の部分を復習してください。

> swirl()

次のステップも同じですから、和訳や解説は省略します。

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you did then.
| If you are new, call yourself something unique.

What shall I call you? SL

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files        3: Sequences of Numbers    
 4: Vectors                    5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                      9: Functions               
10: lapply and sapply         11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times           15: Base Graphics           


Selection: 8

8: Logic(第8課 論理)を選択しました。

  |                                                                                         |   0%

| This lesson is meant to be a short introduction to logical operations in R.

...

この課では、R の論理演算について、簡単に学びます。

  |==                                                                                       |   2%
| There are two logical values in R, also called boolean values. They are TRUE and FALSE. In R you
| can construct logical expressions which will evaluate to either TRUE or FALSE.

...

R には、二つの論理値があり、ブール値(boolean values)とも言います。R の論理式(表現)を構成し、TRUE か FALSE かの値を得ます。

  |===                                                                                      |   4%
| Many of the questions in this lesson will involve evaluating logical expressions. It may be
| useful to open up a second R terminal where you can experiment with some of these expressions.

...

この課の練習問題には、論理値を計算するものがたくさんあります。もう一つ、R の端末を開いて、実験しながら、進めることをお勧めします。

|=====                                                                                    |   6%
| Creating logical expressions requires logical operators. You're probably familiar with
| arithmetic operators like `+`, `-`, `*`, and `/`. The first logical operator we are going to
| discuss is the equality operator, represented by two equals signs `==`. Use the equality
| operator below to find out if TRUE is equal to TRUE.

論理式には、論理演算子を使います。数の計算で、+, -, *, や / を使うことには慣れていると思います。最初に学ぶのは、等価であることを示す論理の等号 == です。この等号を用いて、TRUE が TRUE と等価であることを確認してください。

> TRUE == TRUE
[1] TRUE

| You are really on a roll!

どんどん学んでいますね。

  |=======                                                                                  |   8%
| Just like arithmetic, logical expressions can be grouped by parenthesis so that the entire
| expression (TRUE == TRUE) == TRUE evaluates to TRUE.

...

(数の)算術計算のときと同じように、論理式は、括弧でグループにします。(TRUE == TRUE) == TRUE は、TRUE となります。

  |=========                                                                                |  10%
| To test out this property, try evaluating (FALSE == TRUE) == FALSE .

この性質を確認するために、(FALSE == TRUE) == FALSE を試してください。

> (FALSE == TRUE) == FALSE
[1] TRUE

| You got it right!

その通りです。

  |==========                                                                               |  12%
| The equality operator can also be used to compare numbers. Use `==` to see if 6 is equal to 7.

等価演算子は、数の比較にも使えます。== を使って、6 が 7 と等しいか確認してください。

> 6 == 7
[1] FALSE

| You're the best!

最高です。

  |============                                                                             |  13%
| The previous expression evaluates to FALSE because 6 is less than 7. Thankfully, there are
| inequality operators that allow us to test if a value is less than or greater than another
| value.

...

前の式では、6 は 7 より小さいので、等価ではなく、FALSE となりました。よりちいさいをあらわす不等号演算子もあります。

  |==============                                                                           |  15%
| The less than operator `<` tests whether the number on the left side of the operator (called the
| left operand) is less than the number on the right side of the operator (called the right
| operand). Write an expression to test whether 6 is less than 7.

よりちいさいを表す演算子は < で、演算子の左がわの数(左項 the left operand)が右側の数(右項 the right operand)より小さいかを調べます。6 は 7 より小さいは、どう書きますか。

> 6 < 7
[1] TRUE

| Excellent job!

素晴らしい。

  |===============                                                                          |  17%
| There is also a less-than-or-equal-to operator `<=` which tests whether the left operand is less
| than or equal to the right operand. Write an expression to test whether 10 is less than or equal
| to 10.

他に、左項は右項より小さいか等しい(以下)を表す、<= もあります。10 は 10 以下かを表す論理式を書いてください。

> 10 <= 10
[1] TRUE

| Keep up the great work!

その調子です。

  |=================                                                                        |  19%
| Keep in mind that there are the corresponding greater than `>` and greater-than-or-equal-to `>=`
| operators.

...

これらと対応して、より大きいを表す > と より大きいか等しい(以上)を表す >= もあります。

  |===================                                                                      |  21%
| Which of the following evaluates to FALSE?

1: 7 == 7
2: 9 >= 10
3: 6 < 8
4: 0 > -36

値が、FALSE となるのは、どれですか。

Selection: 2

| Nice work!

それで良いですよ。

  |=====================                                                                    |  23%
| Which of the following evaluates to TRUE?

1: 7 == 9
2: 57 < 8
3: -6 > -7
4: 9 >= 10

値が、TRUE となるのは、どれですか。

Selection: 3

| Keep up the great work!

その調子です。

  |======================                                                                   |  25%
| The next operator we will discuss is the 'not equals' operator represented by `!=`. Not equals
| tests whether two values are unequal, so TRUE != FALSE evaluates to TRUE. Like the equality
| operator, `!=` can also be used with numbers. Try writing an expression to see if 5 is not equal
| to 7.

次は、等しくない (‘not equals’) を表す演算子 != です。値が等しくないときに、TRUE となりますから、TRUE != FALSE は、TRUE です。等号と同じように、!= も数にも使えます。5 は 7 とは等しくないことを、確認してください。

> 5 != 7
[1] TRUE

| You are really on a roll!

どんどん学んでいますね。

  |========================                                                                 |  27%
| In order to negate boolean expressions you can use the NOT operator. An exclamation point `!`
| will cause !TRUE (say: not true) to evaluate to FALSE and !FALSE (say: not false) to evaluate to
| TRUE. Try using the NOT operator and the equals operator to find the opposite of whether 5 is
| equal to 7.

ブール表現(boolean expressions (二値)論理式)を否定するには、否定演算子 ! を使います。!TRUE (not true と読みます) は、FALSE となり、!FALSE (not false と読みます) は、TRUE となります。否定演算子と等号を使って、5 は 7 と等しいの否定はどうなるか見てください。

> !(5 == 7)
[1] TRUE

| Keep up the great work!

その調子です。

  |==========================                                                               |  29%
| Let's take a moment to review. The equals operator `==` tests whether two boolean values or
| numbers are equal, the not equals operator `!=` tests whether two boolean values or numbers are
| unequal, and the NOT operator `!` negates logical expressions so that TRUE expressions become
| FALSE and FALSE expressions become TRUE.

...

ちょっとここまでの部分を復習しましょう。等価演算子 == は、ブール値(TRUE か FALSE か)が等しいかを試し、不等価演算子 != は、ブール値が等しくないかを試します。また、否定演算子 ! は、論理式を否定し、論理式の値が TRUE なら、FALSE、FALSE なら TRUE になります。

  |===========================                                                              |  31%
| Which of the following evaluates to FALSE?

1: !(0 >= -1)
2: !FALSE
3: 7 != 8
4: 9 < 10

値が、FALSE となるのは、どれですか。

Selection: 1

| You nailed it! Good job!

見事です。

  |=============================                                                            |  33%
| What do you think the following expression will evaluate to?: (TRUE != FALSE) == !(6 == 7)

1: Can there be objective truth when programming?
2: TRUE
3: FALSE
4: %>%

次の論理式の値はどうなると思いますか。(TRUE != FALSE) == !(6 == 7)

Selection: 2

| You're the best!

最高です。

  |===============================                                                          |  35%
| At some point you may need to examine relationships between multiple logical expressions. This
| is where the AND operator and the OR operator come in.

...

いくつかの論理式の値の関係を調べることが生じると思います。そのときに使うのが、 AND 演算子と、OR 演算子です。

  |=================================                                                        |  37%
| Let's look at how the AND operator works. There are two AND operators in R, `&` and `&&`. Both
| operators work similarly, if the right and left operands of AND are both TRUE the entire
| expression is TRUE, otherwise it is FALSE. For example, TRUE & TRUE evaluates to TRUE. Try
| typing FALSE & FALSE to how it is evaluated.

AND 演算子がどのように働くか見てみましょう。&&& という、二つの AND 演算子があります。どちらも同様に、右項と左項が共に、TRUE なら 全体として TRUE、そうでなければ、FALSE になります。たとえば、TRUE & TRUE は TRUE です。FALSE & FALSE がどうなるか、入力して確認してください。

> FALSE & FALSE
[1] FALSE

| You nailed it! Good job!

見事です。

  |==================================                                                       |  38%
| You can use the `&` operator to evaluate AND across a vector. The `&&` version of AND only
| evaluates the first member of a vector. Let's test both for practice. Type the expression TRUE &
| c(TRUE, FALSE, FALSE).

& 演算子は、ベクトルのすべての要素に適用されますが、もう一つの AND 演算子 && は、ベクトルの最初の要素にだけ適用されます。練習のため、両方とも確認してみましょう。TRUE & c(TRUE, FALSE, FALSE) と入力してください。

> TRUE & c(TRUE, FALSE, FALSE)
[1]  TRUE FALSE FALSE

| That's a job well done!

よくできました。

  |====================================                                                     |  40%
| What happens in this case is that the left operand `TRUE` is recycled across every element in
| the vector of the right operand. This is the equivalent statement as c(TRUE, TRUE, TRUE) &
| c(TRUE, FALSE, FALSE).

...

この場合には、左項の TRUE は、右項のベクトルのすべての要素に対応して、再利用されたのでした。したがって、c(TRUE, TRUE, TRUE) & c(TRUE, FALSE, FALSE) を計算したのと同じことになります。

  |======================================                                                   |  42%
| Now we'll type the same expression except we'll use the `&&` operator. Type the expression TRUE
| && c(TRUE, FALSE, FALSE).

今度は、同じ論理式で、&& を使ってどうなるか見てみましょう。TRUE && c(TRUE, FALSE, FALSE) と入力してください。

> TRUE && c(TRUE, FALSE, FALSE)
[1] TRUE

| Your dedication is inspiring!

感動ものです。

  |=======================================                                                  |  44%
| In this case, the left operand is only evaluated with the first member of the right operand (the
| vector). The rest of the elements in the vector aren't evaluated at all in this expression.

...

今度は、左項は、右項ベクトルの最初の成分の値のみ使って評価します。それ以外は、まったく利用しません。

  |=========================================                                                |  46%
| The OR operator follows a similar set of rules. The `|` version of OR evaluates OR across an
| entire vector, while the `||` version of OR only evaluates the first member of a vector.

...

OR 演算子も同様です。OR 演算子の一つである | は、ベクトルすべてに対して評価を行い、もう一つの || は、ベクトルの最初の一つのみ使います。

  |===========================================                                              |  48%
| An expression using the OR operator will evaluate to TRUE if the left operand or the right
| operand is TRUE. If both are TRUE, the expression will evaluate to TRUE, however if neither are
| TRUE, then the expression will be FALSE.

...

OR 演算子は、左項か右項のどちらかが TRUE なら TRUE となります。両方ともが、TRUE のときも、TRUE となり、どちらも、TRUE でなければ FALSE になります。

  |============================================                                             |  50%
| Let's test out the vectorized version of the OR operator. Type the expression TRUE | c(TRUE,
| FALSE, FALSE).

ベクトル形式で OR 演算子の働きを確認しましょう。TRUE | c(TRUE, FALSE, FALSE) はどうなりますか。

> TRUE | c(TRUE, FALSE, FALSE)
[1] TRUE TRUE TRUE

| You got it!

その通りです。

  |==============================================                                           |  52%
| Now let's try out the non-vectorized version of the OR operator. Type the expression TRUE ||
| c(TRUE, FALSE, FALSE).

ベクトルの成分すべてを使うわけではない OR 演算子 || を試しましょう。TRUE || c(TRUE, FALSE, FALSE) はどうなりますか。

> TRUE || c(TRUE, FALSE, FALSE)
[1] TRUE

| You're the best!

最高です。

  |================================================                                         |  54%
| Logical operators can be chained together just like arithmetic operators. The expressions: `6 !=
| 10 && FALSE && 1 >= 2` or `TRUE || 5 < 9.3 || FALSE` are perfectly normal to see.

...

論理演算は、数値演算のときと同じように、つなげて使います。6 != 10 && FALSE && 1 >= 2TRUE || 5 < 9.3 || FALSE などは一般的な表現です。

  |==================================================                                       |  56%
| As you may recall, arithmetic has an order of operations and so do logical expressions. All AND
| operators are evaluated before OR operators. Let's look at an example of an ambiguous case.
| Type: 5 > 8 || 6 != 8 && 4 > 3.9

数の計算のときに、演算の順序がありましたが、論理演算でも同様です。AND 演算子を OR 演算子よりも先に計算します。あいまいな場合を見てみましょ。5 > 8 || 6 != 8 && 4 > 3.9 と入力してください。

> 5 > 8 || 6 != 8 && 4 > 3.9
[1] TRUE

| You are really on a roll!

どんどん学んでいますね。

  |===================================================                                      |  58%
| Let's walk through the order of operations in the above case. First the left and right operands
| of the AND operator are evaluated. 6 is not equal 8, 4 is greater than 3.9, therefore both
| operands are TRUE so the resulting expression `TRUE && TRUE` evaluates to TRUE. Then the left
| operand of the OR operator is evaluated: 5 is not greater than 8 so the entire expression is
| reduced to FALSE || TRUE. Since the right operand of this expression is TRUE the entire
| expression evaluates to TRUE.

...

上の場合を一ステップ毎確認してみましょう。まずは、最初に右にある、AND 演算子の部分を計算します。6 は 8 と等しくなく、4 は 3.9 より大きいので、どちらも TRUE となり、TRUE && TRUE の値は TRUE です。次に、左の OR 演算子の左項 の 5 は 8 より大きくはないので、FALSE です。結局、FALSE || TRUE となります。右項は TRUE ですから、全体として、TRUE となります。

  |=====================================================                                    |  60%
| Which one of the following expressions evaluates to TRUE?

1: TRUE && 62 < 62 && 44 >= 44
2: FALSE || TRUE && FALSE
3: 99.99 > 100 || 45 < 7.3 || 4 != 4.0
4: TRUE && FALSE || 9 >= 4 && 3 < 6

値が、TRUE となるのは、どれですか。

Selection: 4

| Nice work!

それで良いですよ。

  |=======================================================                                  |  62%
| Which one of the following expressions evaluates to FALSE?

1: !(8 > 4) ||  5 == 5.0 && 7.8 >= 7.79
2: FALSE && 6 >= 6 || 7 >= 8 || 50 <= 49.5
3: FALSE || TRUE && 6 != 4 || 9 > 4
4: 6 >= -9 && !(6 > 7) && !(!TRUE)

値が、FALSE となるのは、どれですか。

Selection: 2

| Great job!

やりましたね。

  |========================================================                                 |  63%
| Now that you're familiar with R's logical operators you can take advantage of a few functions
| that R provides for dealing with logical expressions.

...

R の論理演算子になれたと思います。R には論理表現をあつかういくつかの関数があります。

  |==========================================================                               |  65%
| The function isTRUE() takes one argument. If that argument evaluates to TRUE, the function will
| return TRUE. Otherwise, the function will return FALSE. Try using this function by typing:
| isTRUE(6 > 4)

isTRUE() の引数(argument)はひとつで、その引数の値が、TRUE なら、TRUE を値として返し、そうでなければ、FALSE を値として返します。isTRUE(6 > 4) を試してみましょう。

> isTRUE(6 > 4)
[1] TRUE

| You're the best!

最高です。

  |============================================================                             |  67%
| Which of the following evaluates to TRUE?

1: isTRUE(NA)
2: isTRUE(!TRUE)
3: !isTRUE(4 < 3)
4: !isTRUE(8 != 5)
5: isTRUE(3)

値が、TRUE となるのは、どれですか。

Selection: 3

| You got it!

その通りです。

  |==============================================================                           |  69%
| The function identical() will return TRUE if the two R objects passed to it as arguments are
| identical. Try out the identical() function by typing: identical('twins', 'twins')

indentical() 関数は、引数としてこの関数にわたしたものが、等しいとき、TRUE を値として返します。identical(‘twins’, ‘twins’) を試してください。

> identical('twins', 'twins')
[1] TRUE

| You are quite good my friend!

いい具合ですね。

  |===============================================================                          |  71%
| Which of the following evaluates to TRUE?

1: identical(4, 3.1)
2: identical('hello', 'Hello')
3: identical(5 > 4, 3 < 3.1)
4: !identical(7, 7)

値が、TRUE となるのは、どれですか。

Selection: 3

| You are really on a roll!

どんどん学んでいますね。

  |=================================================================                        |  73%
| You should also be aware of the xor() function, which takes two arguments. The xor() function
| stands for exclusive OR. If one argument evaluates to TRUE and one argument evaluates to FALSE,
| then this function will return TRUE, otherwise it will return FALSE. Try out the xor() function
| by typing: xor(5 == 6, !FALSE)

xor() 関数も知っておくとよいでしょう。二つの引数をとります。xor() 関数の名前は、exclusive OR(排他的 OR)からとったものです。二つの引数の一方が TRUE で他方が FALSE のとき、TRUE を値として返し、そうでないときは、FALSE です。xor() 関数を用いた xor(5 == 6, !FALSE) を入力してください。

> xor(5 == 6, !FALSE)
[1] TRUE

| Keep working like that and you'll get there!

その調子で良いですよ。

  |===================================================================                      |  75%
| 5 == 6 evaluates to FALSE, !FALSE evaluates to TRUE, so xor(FALSE, TRUE) evaluates to TRUE. On
| the other hand if the first argument was changed to 5 == 5 and the second argument was unchanged
| then both arguments would have been TRUE, so xor(TRUE, TRUE) would have evaluated to FALSE.

...

5 == 6 の値は、FALSE です。!FALSE の値は、TRUE です。そこで、xor(FALSE, TRUE) の値は、TRUE となります。最初の引数を 5 == 5 に変えると、二つ目は、変化していませんから、TRUE で、xor(TRUE, TRUE) となり、値は FALSE となります。

  |====================================================================                     |  77%
| Which of the following evaluates to FALSE?

1: xor(!isTRUE(TRUE), 6 > -1)
2: xor(identical(xor, 'xor'), 7 == 7.0)
3: xor(!!TRUE, !!FALSE)
4: xor(4 >= 9, 8 != 8.0)

値が、FALSE となるのは、どれですか。

Selection: 4

| Your dedication is inspiring!

感動ものです。

  |======================================================================                   |  79%
| For the next few questions, we're going to need to create a vector of integers called ints.
| Create this vector by typing: ints <- sample(10)

次のいくつかの問題のために、整数からなる ints という名前のベクトルを作成します。ints <- sample(10) としてください。

> ints <- sample(10)

| You got it!

その通りです。

  |========================================================================                 |  81%
| Now simply display the contents of ints.

中身を表示してください。

> ints
 [1]  6  4  5  8 10  9  3  2  7  1

| You are quite good my friend!

いい具合ですね。

  |==========================================================================               |  83%
| The vector `ints` is a random sampling of integers from 1 to 10 without replacement. Let's say
| we wanted to ask some logical questions about contents of ints. If we type ints > 5, we will get
| a logical vector corresponding to whether each element of ints is greater than 5. Try typing:
| ints > 5

ints は、無作為に 1 から 10 までの整数を、同じ数を二度えらばないように抽出したものです。たとえば、ints > 5 のように、ints の成分についての論理値を問うてみましょう。すると、成分が 5 より大きいかどうかによって決まる論理値からなる論理ベクトルを得ます。ints > 5 と入力してください。

> ints > 5
 [1]  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE

| Excellent work!

素晴らしい。

  |===========================================================================              |  85%
| We can use the resulting logical vector to ask other questions about ints. The which() function
| takes a logical vector as an argument and returns the indices of the vector that are TRUE. For
| example which(c(TRUE, FALSE, TRUE)) would return the vector c(1, 3).

...

結果として得た、ベクトルを使って、ints について他の問いをすることもできます。論理ベクトルを引数とする which() 関数は、TRUE となるインデックス(またはインデックスからなるベクトル)を値として返します。たとえば、which(c(TRUE, FALSE, TRUE)) は、ベクトル c(1,3) となります。

  |=============================================================================            |  87%
| Use the which() function to find the indices of ints that are greater than 7.

which() 関数を使って、ints の成分のうち、7 より大きな数の場所のインデックスを求めてください。

> which(ints > 7)
[1] 4 5 6

| Keep up the great work!

その調子です。

  |===============================================================================          |  88%
| Which of the following commands would produce the indices of the elements in ints that are less
| than or equal to 2?

1: which(ints < 2)
2: which(ints <= 2)
3: ints <= 2
4: ints < 2

ints の2以下の成分に対応するインデックスをこたえる命令はどれですか。

Selection: 2

| You're the best!

最高です。

  |================================================================================         |  90%
| Like the which() function, the functions any() and all() take logical vectors as their argument.
| The any() function will return TRUE if one or more of the elements in the logical vector is
| TRUE. The all() function will return TRUE if every element in the logical vector is TRUE.

...

which() のように、any() や all() といった関数も、論理ベクトル一つを引数とする関数です。any() は、論理ベクトルの成分の最低一つが、 TRUE であるときに、TRUE となり、all() は、すべての成分が TRUE であるときに、TRUE という値をとります。

  |==================================================================================       |  92%
| Use the any() function to see if any of the elements of ints are less than zero.

any() を使って、ints の成分のいずれかが 0 より小さいかどうか調べてください。

> any(ints < 0)
[1] FALSE

| You are quite good my friend!

いい具合ですね。

  |====================================================================================     |  94%
| Use the all() function to see if all of the elements of ints are greater than zero.

今度は、all() を使って、ints のすべての成分が、0 より大きいか、確かめてください。

> all(ints > 0)
[1] TRUE

| You got it right!

その通りです。

  |======================================================================================   |  96%
| Which of the following evaluates to TRUE?

1: any(ints == 10)
2: all(ints == 10)
3: any(ints == 2.5)
4: all(c(TRUE, FALSE, TRUE))

値が、TRUE となるのは、どれですか。

Selection: 1

| That's a job well done!

よくできました。

  |=======================================================================================  |  98%
| That's all for this introduction to logic in R. If you really want to see what you can do with
| logic, check out the control flow lesson!

...

これで、R の論理に関する入門編は終了です。もっと論理について学びたいときは、流れ制御の課を探して勉強してください。

  |=========================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

Coursera.org の終了認定が欲しいですか。

1 を選んでください。何度か説明しましたから、省略します。

| That's the answer I was looking for.

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 0

そうです。それが、期待したものです。

課の最後までたどり着きました。メニューに戻るには、

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: `swirl` コースのリストへ.

選択:0 (SL が入力)

| Leaving swirl now. Type swirl() to resume.

swirl を終了します。再開するときは、swirl() と入力してください。

>

2.9. Functions

すでにたくさんの関数 (functions) を使ってきましたが、この課では、新たな関数を作成する方法を学びます。関数の引数(arguments)についても学びますから、関数を利用するときにも、役に立ちます。
新たな関数を作成するときなど、swirl がすでに作成してある、R Script を修正する練習問題がいくつかあります。R Studio においては、Console の上の Pane に *.R という、拡張しが .R となっているファイルが表示されます。それを修正して、Save 保存してください。Source プルダウンメニューを押しても保存されます。保存をしてから、submit() を入力してください。‘#’ マーク(pound sign と呼ばれています。hash mark とも呼びます。)が現れますが、この記号以降は、コメントです。問題によっては、これを消す必要がある場合もあります。
swirl() ではじめるところから、解説します。最初に、swirl を Install したり、library(swirl) を実行する部分は、省略します。よくわからない方は、1.2 または、1.1 の最初の部分を復習してください。

> swirl()

次のステップも同じですから、和訳や解説は省略します。

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you did then.
| If you are new, call yourself something unique.

What shall I call you? SL

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files        3: Sequences of Numbers    
 4: Vectors                    5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                      9: Functions               
10: lapply and sapply         11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times           15: Base Graphics           


Selection: 9

9: Logic(第9課 関数)を選択しました。

  |                                                                                         |   0%

| Functions are one of the fundamental building blocks of the R language. They are small pieces of
| reusable code that can be treated like any other R object.

...

関数(functions)は、R 言語の基本的な構成要素です。関数は、再利用するための小さなプログラムで、R の他のオブジェクトと同じように扱います。

  |==                                                                                       |   2%
| If you've worked through any other part of this course, you've probably used some functions
| already. Functions are usually characterized by the name of the function followed by
| parentheses.

...

すでにこのコースの他の課について学んでいれば、いくつかの関数を利用していると思います。関数は、通常、関数名のあとに () が付いています。

  |====                                                                                     |   4%
| Let's try using a few basic functions just for fun. The Sys.Date() function returns a string
| representing today's date. Type Sys.Date() below and see what happens.

基本的な関数を使って遊んでみましょう。Sys.Date() は、今日の日付を表示させます。

Sys.Date()
## [1] "2020-05-12"

この R Markdown ファイルが、最後に改訂され、html file が作成された日が表示されています。

| Great job!

やりましたね。

  |=====                                                                                    |   6%
| Most functions in R return a value. Functions like Sys.Date() return a value based on your
| computer's environment, while other functions manipulate input data in order to compute a return
| value.

...

ほとんどの R の関数は値を返します。Sys.Date() のような関数は、あなたのコンピュータ環境から値を得て、出力します。他の関数は、入力値から、出力値を求めます。

  |=======                                                                                  |   8%
| The mean() function takes a vector of numbers as input, and returns the average of all of the
| numbers in the input vector. Inputs to functions are often called arguments. Providing arguments
| to a function is also sometimes called passing arguments to that function. Arguments you want to
| pass to a function go inside the function's parentheses. Try passing the argument c(2, 4, 5) to
| the mean() function.

mean() 関数は、数ベクトルを入力し、すのすべての数値の平均を出力します。関数に入力するものは、arguments(引数)と呼びます。引数を関数に提供することを、引数を関数に渡すとも表現します。関数に渡す引数は、関数のあとの()の中に書きます。引数を c(2,4,5) を関数 mean() に渡してください。

> mean(c(2,4,5))
[1] 3.666667

| That's a job well done!

よくできました。

  |=========                                                                                |  10%
| Functions usually take arguments which are variables that the function operates on. For example,
| the mean() function takes a vector as an argument, like in the case of mean(c(2,6,8)). The
| mean() function then adds up all of the numbers in the vector and divides that sum by the length
| of the vector.

...

引数は通常関数が作用する変数です。mean() 関数は、たとえば、mean(c(2,6,8)) のように、ベクトルを引数とします。mean() 関数は、そのベクトルの成分である数の和(足し算)を計算し、そのベクトルの長さで割ります。

  |===========                                                                              |  12%
| In the following question you will be asked to modify a script that will appear as soon as you
| move on from this question. When you have finished modifying the script, save your changes to
| the script and type submit() and the script will be evaluated. There will be some comments in
| the script that opens up, so be sure to read them!

...

以下の練習問題においては、この問題から移動すると、すぐに、スクリプトが(R Studio の場合には、左上の pane(窓枠)に R Script が)表示されます。そのスクリプトを適切に修正したら、submit() と入力すると、それが、評価されます。スクリプトには、いくつものコメントが書かれていますから、必ず読んでください。

boring_function.R

# You're about to write your first function! Just like you would assign a value 
# to a variable with the assignment operator, you assign functions in the following
# way:
# これから、最初の関数を作ります。変数名に値を割り当てたように、関数を次のように置きます。
#
# function_name <- function(arg1, arg2){
#   # Manipulate arguments in some way
#   # Return a value
# }
# こんな感じです。
#
# The "variable name" you assign will become the name of your function. arg1 and
# arg2 represent the arguments of your function. You can manipulate the arguments
# you specify within the function. After sourcing the function, you can use the 
# function by typing:
# 
# function_name(value1, value2)
# 割り当てる変数名("variable name")があなたの関数名になります。arg1, arg2 は関数の引数(arguments)です。その引数をどのようにするかを関数の中に書きます。関数ができたら、それを、使えるように保存(source)すると、function_name(value1, value2) として使うことができるようになります。
#
# Below we will create a function called boring_function. This function takes
# the argument `x` as input, and returns the value of x without modifying it.
# Delete the pound sign in front of the x to make the function work! Be sure to 
# save this script and type submit() in the console after you make your changes.
# 下では、borging_funtion(つまらない関数)という名前の関数を作成します。引数 x をうけとって、そのまま、何の変更もせずに、x を出力します。x の前にある、pound sign (#) を消して、関数を作成してください。それができたら、スクリプトを保存して、submit() と Console に入力してください。

boring_function <- function(x) {
#  x
}

ここまでが、左上の pane に表示されている、boring_function.R という名前のスクリプトの内容です。

  |=============                                                                            |  14%
| The last R expression to be evaluated in a function will become the return value of that
| function. We want this function to take one argument, x, and return x without modifying it.
| Delete the pound sign so that x is returned without any modification. Make sure to save your
| script before you type submit().

関数では、最後に計算された値が、返されます。この(左上の pane に表示されている boring_function.R という名前のスクリプトの)関数では、一つの引数 x を受け取り、なにも修正せず、x を返します。x の前にある、pound sign (#) を消して、x をそのまま返すようにしてください。修正が終わったら、そのスクリプトを保存して、submit() と入力してください。

boring_function <- function(x) {
  x
}

最後の部分を上のようにして、スクリプトを保存します。

> submit()

| Sourcing your script...

| You are amazing!

驚きです。

  |===============                                                                          |  16%
| Now that you've created your first function let's test it! Type: boring_function('My first
| function!'). If your function works, it should just return the string: 'My first function!'

あなたの最初の関数を作成しました。boring_function(‘My first function!’) と入力してみてください。あなたの関数がうまく働けば、そのまま ‘My first function!’ と表示するはずです。

> boring_function('My first function!')
[1] "My first function!"

| You nailed it! Good job!

見事です。

  |================                                                                         |  18%
| Congratulations on writing your first function. By writing functions, you can gain serious
| insight into how R works. As John Chambers, the creator of R once said:
| 
| To understand computations in R, two slogans are helpful: 1. Everything that exists is an
| object. 2. Everything that happens is a function call.

...

おめでとうございます。関数を作成すると、その過程で、R がどのように動いているのか理解することができます。John Chambers という R の作成者は、こんなことを言いました。

R による計算を理解するためには、二つのことが重要です。
1. すべて存在するものは、オブジェクトである。
2.すべてひきおこされることは、なんらかの関数が使われたということ。

  |==================                                                                       |  20%
| If you want to see the source code for any function, just type the function name without any
| arguments or parentheses. Let's try this out with the function you just created. Type:
| boring_function to view its source code.

関数がどのようなプログラムで書かれているかを見たかったら、単純に、関数名だけを、引数や()を付けずに入力してください。いま、あなたが作成した関数について試してみましょう。boring_function と入力してください。

> boring_function
function(x) {
  x
}

| Keep working like that and you'll get there!

その調子で良いですよ。

  |====================                                                                     |  22%
| Time to make a more useful function! We're going to replicate the functionality of the mean()
| function by creating a function called: my_mean(). Remember that to calculate the average of all
| of the numbers in a vector you find the sum of all the numbers in the vector, and then divide
| that sum by the number of numbers in the vector.

...

もう少し、使い物になる関数を作成してみましょう。mean() 関数を再現して、my_mean() という関数を作ってみましょう。平均を計算するには、まずは、ベクトルの成分である数をすべて加え、それから、そのベクトルの成分の数で割るのでした。

my_mean.R

# You're free to implement the function my_mean however you want, as long as it
# returns the average of all of the numbers in `my_vector`.
# `my_vector` の数をの平均を計算するのであれば、my_mean 関数をどのように変更してもかまいません。
#
# Hint #1: sum() returns the sum of a vector.
#   Ex: sum(c(1, 2, 3)) evaluates to 6
# ヒント1:sum() は数ベクトルの総和を計算します。
# 例1:sum(c(1, 2, 3)) の値は 6  です。
#
# Hint #2: length() returns the size of a vector.
#   Ex: length(c(1, 2, 3)) evaluates to 3
# ヒント2:length() 関数はベクトルの大きさ(長さ)を返します。
# 例2:length(c(1, 2, 3)) の値は 3です。
#
# Hint #3: The mean of all the numbers in a vector is equal to the sum of all of
#          the numbers in the vector divided by the size of the vector.
# ヒント3:ベクトルの成分の平均(mean)は、ベクトルの成分の総和をそのベクトルの大きさ(長さ)で割ったものです。
#
# Note for those of you feeling super clever: Please do not use the mean()
# function while writing this function. We're trying to teach you something 
# here!
# 自分は、かしこいな。そんなのは、mean() を使えばよいなどと考えないでくださいね。ここでは、関数について、学んで欲しいのですから。
#
# Be sure to save this script and type submit() in the console after you make 
# your changes.
# このスクリプトを編集し、保存したら、Console に、submit() と入力してください。

my_mean <- function(my_vector) {
  # Write your code here!
  # Remember: the last expression evaluated will be returned! 

}

ここまでが、左上の pane に表示されている、my_mean.R という名前のスクリプトの内容です。

  |======================                                                                   |  24%
| Make sure to save your script before you type submit().

スクリプトを保存してから、submit() と入力してください。

my_mean <- function(my_vector) {
  sum(my_vector)/length(my_vector)
}

最後の部分を上のようにして、スクリプトを保存します。

> submit()

| Sourcing your script...


| You got it right!

その通りです。

  |========================                                                                 |  27%
| Now test out your my_mean() function by finding the mean of the vector c(4, 5, 10).

my_mean() 関数を使って、ベクトル c(4, 5, 10) の平均を求めてください。

my_mean(c(4, 5, 10))
## [1] 6.333333
| All that hard work is paying off!

努力は報われます。

  |=========================                                                                |  29%
| Next, let's try writing a function with default arguments. You can set default values for a
| function's arguments, and this can be useful if you think someone who uses your function will
| set a certain argument to the same value most of the time.

...

次には、引数に初期値を指定した関数を書いてみましょう。関数の引数に初期値を設定することができます。だれかが、ほとんどの場合同じ特定の引数を使ってあなたの関数を利用する場合に有効です。

remainder.R

# Let me show you an example of a function I'm going to make up called
# increment(). Most of the time I want to use this function to increase the
# value of a number by one. This function will take two arguments: "number" and
# "by" where "number" is the digit I want to increment and "by" is the amount I
# want to increment "number" by. I've written the function below. 
# まず、increment() (増加)という関数を作成してみます。ほとんどの場合、数に1を足すとします。この関数の引数は二個 "number" と "by" で、"number" を "by" だけ増やします。下のような関数です。
#
# increment <- function(number, by = 1){
#     number + by
# 
#
# If you take a look in between the parentheses you can see that I've set
# "by" equal to 1. This means that the "by" argument will have the default
# value of 1.
# ()の中をみると、"by" を 1 とおいています。引数 "by" の初期値を 1 とおいたということです。
#
# I can now use the increment function without providing a value for "by": 
# increment(5) will evaluate to 6. 
# すると、"by" の値を指定せずに、increment(5) としただけで、6 となります。
#
# However if I want to provide a value for the "by" argument I still can! The
# expression: increment(5, 2) will evaluate to 7. 
# しかし、引数 "by" を使おうと思えば、それも可能です。increment(5, 2) とすると、7 となります。 
# 
# You're going to write a function called "remainder." remainder() will take
# two arguments: "num" and "divisor" where "num" is divided by "divisor" and
# the remainder is returned. Imagine that you usually want to know the remainder
# when you divide by 2, so set the default value of "divisor" to 2. Please be
# sure that "num" is the first argument and "divisor" is the second argument.
# これから、"remainder"(あまり)関数を書いてもらいます。引数は、二つ。"num"(数、割られる数)と "divisor"(除数、割る数)で、"num" を "divisor" で割ったあまりを計算します。ただし、いつもは、2 で割ったあまりが欲しいとします。そのときは、"divisor" の初期値(default value)を 2 にします。"num" は最初の引数、"divisor" は二番目の引数です。
#
# Hint #1: You can use the modulus operator %% to find the remainder.
#   Ex: 7 %% 4 evaluates to 3. 
# ヒント1:あまりを計算する、法演算子 %% を使ってもかまいません。
#  例:7 %% 4 は、3 となります。
#
# Remember to set appropriate default values! Be sure to save this 
# script and type submit() in the console after you write the function.
# 初期値を指定すること。そして、スクリプトを保存してから、submit() と Console に入力してください。

remainder <- function(num, divisor) {
  # Write your code here!
  # Remember: the last expression evaluated will be returned! 

}

ここまでが、左上の pane に表示されている、remainder.R という名前のスクリプトの内容です。

  |===========================                                                              |  31%
| Make sure to save your script before you type submit().

スクリプトを保存してから、submit() と入力してください。

remainder <- function(num, divisor = 2) {
  # Write your code here!
  # Remember: the last expression evaluated will be returned! 
  num %% divisor
 }

最後の部分を上のようにして、スクリプトを保存します。

> submit()

| Sourcing your script...


| That's the answer I was looking for.

そうです。それが、期待したものです。

  |=============================                                                            |  33%
| Let's do some testing of the remainder function. Run remainder(5) and see what happens.

remainder 関数をテストしてみましょう。remainder(5) を走らせてみましょう。どうなりますか。

remainder(5)
## [1] 1
| Keep working like that and you'll get there!

その調子で良いですよ。

  |===============================                                                          |  35%
| Let's take a moment to examine what just happened. You provided one argument to the function,
| and R matched that argument to 'num' since 'num' is the first argument. The default value for
| 'divisor' is 2, so the function used the default value you provided.

...

なにが起こったか、考えてみましょう。関数に、一つだけ、引数を渡しました。R は、‘num’ に引数をわたしました。それが最初の引数 (argument) だからです。‘divisor’ の初期値が、2 なので、関数はその初期値を使いました。

  |=================================                                                        |  37%
| Now let's test the remainder function by providing two arguments. Type: remainder(11, 5) and
| let's see what happens.

今度は、二つの引数をわたして、remainder(11,5) と入力してください。

remainder(11, 5)
## [1] 1
| That's correct!

正しいです。

  |===================================                                                      |  39%
| Once again, the arguments have been matched appropriately.

...

今回も、引数は、適切に対応しました。

  |====================================                                                     |  41%
| You can also explicitly specify arguments in a function. When you explicitly designate argument
| values by name, the ordering of the arguments becomes unimportant. You can try this out by
| typing: remainder(divisor = 11, num = 5).

明示的に引数を関数に渡すことができます。名前を明示して引数を渡すときは、引数の順序は、重要ではありません。remainder(divisor = 11, num = 5) と入力してください。

remainder(divisor = 11, num = 5)
## [1] 5
| Keep working like that and you'll get there!

その調子で良いですよ。

  |======================================                                                   |  43%
| As you can see, there is a significant difference between remainder(11, 5) and remainder(divisor
| = 11, num = 5)!

...

remainder(11, 5) と、remainder(divisor = 11, num = 5) には本質的な違いがあります。

  |========================================                                                 |  45%
| R can also partially match arguments. Try typing remainder(4, div = 2) to see this feature in
| action.

R は部分的に、引数を対応させることもできます。remainder(4, num = 2) と入力してください。

remainder(4, div = 2)
## [1] 0
| You are quite good my friend!

いい具合ですね。

  |==========================================                                               |  47%
| A word of warning: in general you want to make your code as easy to understand as possible.
| Switching around the orders of arguments by specifying their names or only using partial
| argument names can be confusing, so use these features with caution!

...

ひとこと注意をしておきます。一般的には、プログラムを書くときには、わかりやすく書くべきで、引数の名前を明示しつつ順序を変えたり、部分的に引数の名前を使ったりするのは、混乱を招くので、注意を要します。

  |============================================                                             |  49%
| With all of this talk about arguments, you may be wondering if there is a way you can see a
| function's arguments (besides looking at the documentation). Thankfully, you can use the args()
| function! Type: args(remainder) to examine the arguments for the remainder function.

引数について述べてきましたが、関数の引数を(Help などの文書を見ずに)確認ができるかと思うかもしれません。感謝名事に、それは、args() を使えば見ることができます。args(remainder) と入力して、remainder 関数の引数を調べてください。

args(remainder)
## function (num, divisor = 2) 
## NULL
| Excellent job!

素晴らしい。

  |=============================================                                            |  51%
| You may not realize it but I just tricked you into doing something pretty interesting! args() is
| a function, remainder() is a function, yet remainder was an argument for args(). Yes it's true:
| you can pass functions as arguments! This is a very powerful concept. Let's write a script to
| see how it works.

...

気づかなかったかもしれませんが、実は、ちょっとあなたを誘導してとても面白いことをしてみたのです。args() は関数ですが、remainder() も関数です。そして、remainder args の引数でした。そうその通り。関数を引数として渡しました。これは、とても強力な概念です。スクリプトをひとつかいてこの状況を確認しましょう。

evaluate.R

# You can pass functions as arguments to other functions just like you can pass
# data to functions. Let's say you define the following functions:
# データを引数として関数に渡すのと同じように、関数の引数として、他の関数を渡すこともできます。三つの関数を次のように定義してみましょう。
#
# add_two_numbers <- function(num1, num2){
#    num1 + num2
# }
#
# multiply_two_numbers <- function(num1, num2){
#   num1 * num2
# }
#
# some_function <- function(func){
#    func(2, 4)
# }
#
# As you can see we use the argument name "func" like a function inside of 
# "some_function()." By passing functions as arguments 
# some_function(add_two_numbers) will evaluate to 6, while
# some_function(multiply_two_numbers) will evaluate to 8.
# "some_function()" に、"func" という名前の関数を引数として、渡しました。some_function(add_two_numbers) のように関数を引数として渡すと、6 を値として得ます。また、some_function(multiply_two_numbers) とすると、今度は、8 となります
# 
# Finish the function definition below so that if a function is passed into the
# "func" argument and some data (like a vector) is passed into the dat argument
# the evaluate() function will return the result of dat being passed as an
# argument to func.
# 下の関数を完成させて、ある関数が "func" に引数として渡され、dat をその func の引数としてわたすようにしてください。
#
# Hints: This exercise is a little tricky so I'll provide a few example of how
# evaluate() should act:
#    1. evaluate(sum, c(2, 4, 6)) should evaluate to 12
#    2. evaluate(median, c(7, 40, 9)) should evaluate to 9
#    3. evaluate(floor, 11.1) should evaluate to 11
# ヒント:この練習問題はちょっと複雑なので、evaluate() がどのように働いて欲しいかを、例示します。  
# 1. evaluate(sum, c(2, 4, 6)) の値は、12  
# 2. evaluate(median, c(7, 40, 9)) の値は 9  
# 3. evaluate(floor, 11.1) の値は、11

evaluate <- function(func, dat){
  # Write your code here!
  # Remember: the last expression evaluated will be returned! 

}

ここまでが、左上の pane に表示されている、evaluate.R という名前のスクリプトの内容です。

  |===============================================                                          |  53%
| Make sure to save your script before you type submit().

スクリプトを保存してから、submit() と入力してください。

evaluate <- function(func, dat){
  # Write your code here!
  # Remember: the last expression evaluated will be returned!           func(dat)
  }

最後の部分を上のようにして、スクリプトを保存します。

> submit()

| Sourcing your script...


| You nailed it! Good job!

見事です。

  |=================================================                                        |  55%
| Let's take your new evaluate() function for a spin! Use evaluate to find the standard deviation
| of the vector c(1.4, 3.6, 7.9, 8.8).

ちょっと、新しい関数 evaluate() を使ってみましょう。ベクトル c(1.4, 3.6, 7.9, 8.8) の 標準偏差 (sdandard diviation: sd()) を evaluate を使って求めてください。

evaluate(sd, c(1.4, 3.6, 7.9, 8.8))
## NULL
| You are amazing!

驚きです。

  |===================================================                                      |  57%
| The idea of passing functions as arguments to other functions is an important and fundamental
| concept in programming.

...

関数を引数として、他の関数にわたすという考えは、プログラムを学んで行く上で、重要かつ基本的な概念です。

  |=====================================================                                    |  59%
| You may be surprised to learn that you can pass a function as an argument without first defining
| the passed function. Functions that are not named are appropriately known as anonymous
| functions.

...

関数を指定せずに、関数を引数として、関数に渡せることに驚いたかもしれません。指定されていない関数のことを、匿名関数と言います。

  |======================================================                                   |  61%
| Let's use the evaluate function to explore how anonymous functions work. For the first argument
| of the evaluate function we're going to write a tiny function that fits on one line. In the
| second argument we'll pass some data to the tiny anonymous function in the first argument.

...

evaluate 関数を使って、匿名関数がどのように働くのか調べてみましょう。最初に、一行に書くことができる小さな関数を一つ目の引数とします。二つ目の引数として、データをわたし、最初の引数の関数で値を求めてみます。

  |========================================================                                 |  63%
| Type the following command and then we'll discuss how it works: evaluate(function(x){x+1}, 6)

evaluate(function(x){x+1}, 6) と入力してどうなるか見てみましょう。

evaluate(function(x){x+1}, 6)
## NULL
| You are doing so well!

とてもうまくいっていますよ。

  |==========================================================                               |  65%
| The first argument is a tiny anonymous function that takes one argument `x` and returns `x+1`.
| We passed the number 6 into this function so the entire expression evaluates to 7.

...

最初の引数の小さな関数は、一つ引数 x をとり、x+1 を返します。6 をこの関数に渡すと渡すと 7 を返します。

  |============================================================                             |  67%
| Try using evaluate() along with an anonymous function to return the first element of the vector
| c(8, 4, 0). Your anonymous function should only take one argument which should be a variable
| `x`.

evaluate() と、匿名関数として、ベクトル c(8,4,0) の最初の成分を返すものとし、変数は、x を使ってください。

evaluate(function(x){x[1]}, c(8, 4, 0))
## NULL
| Excellent work!

素晴らしい。

  |==============================================================                           |  69%
| Now try using evaluate() along with an anonymous function to return the last element of the
| vector c(8, 4, 0). Your anonymous function should only take one argument which should be a
| variable `x`.

今度は、匿名関数を、ベクトル c(8, 4, 0) の最後の成分を返すものとしてください。

evaluate(function(x){x[length(x)]}, c(8, 4, 0))
## NULL
| Great job!

やりましたね。

  |================================================================                         |  71%
| For the rest of the course we're going to use the paste() function frequently. Type ?paste so we
| can take a look at the documentation for the paste function.

この課ののころの部分では、paste() 関数をよく使います。?paste として、paste 関数について、Help を確認してください。

> ?paste

| You are really on a roll!

どんどん学んでいますね。

  |=================================================================                        |  73%
| As you can see the first argument of paste() is `...` which is referred to as an ellipsis or
| simply dot-dot-dot. The ellipsis allows an indefinite number of arguments to be passed into a
| function. In the case of paste() any number of strings can be passed as arguments and paste()
| will return all of the strings combined into one string.

...

paste() の最初の引数は、... となっています。これは、省略記号で、点点点とも読みます。省略記号の部分は、引数がいくつであってもよいことを意味します。paste() の場合には、文字列がいくつあっても、それを、引数として、paste() にわたし、それを、つなげた文字列を返すようになっています。

  |===================================================================                      |  76%
| Just to see how paste() works, type paste("Programming", "is", "fun!")

_どのように、paste() が働くか見るため、paste(“Programming”, “is”, “fun!”) と入力してください。__

paste("Programming", "is", "fun!")
## [1] "Programming is fun!"
| Great job!

やりましたね。

  |=====================================================================                    |  78%
| Time to write our own modified version of paste().

...

paste() 関数の修正版を我々の関数として書いてみましょう。

telegram.R

# The ellipses can be used to pass on arguments to other functions that are
# used within the function you're writing. Usually a function that has the
# ellipses as an argument has the ellipses as the last argument. The usage of
# such a function would look like:
# 省略記号の部分は、あたらに定義している関数のなかで、他の関数の引数として渡すことができます。引数に省略記号を含む関数は、省略記号は、常に最後の引数となります。使い方は、下のようになります。
#
# ellipses_func(arg1, arg2 = TRUE, ...)
#
# In the above example arg1 has no default value, so a value must be provided
# for arg1. arg2 has a default value, and other arguments can come after arg2
# depending on how they're defined in the ellipses_func() documentation.
# 上の例では、arg1 は初期値がないので、引数を渡さないといけません。arg2 には、初期値があります。ほかの引数は、arg2 のあとに続き、ellipses_func() の Help の定義に従います。
# Interestingly the usage for the paste function is as follows:
# 興味深いことに、paste 関数の使い方は次のようになっています。
#
# paste (..., sep = " ", collapse = NULL)
#
# Notice that the ellipses is the first argument, and all other arguments after
# the ellipses have default values. This is a strict rule in R programming: all
# arguments after an ellipses must have default values. Take a look at the
# simon_says function below:
# 省略記号は、最初の引数で、それに続く省略記号以外の引数には、初期値が与えられています。これは、R のプログラムにおける厳格な規則で、省略記号以降の引数は、初期値が与えられていないといけません。下の、simon_says 関数を見てください。
#
# simon_says <- function(...){
#   paste("Simon says:", ...)
# }
#
# The simon_says function works just like the paste function, except the
# begining of every string is prepended by the string "Simon says:"
# simon_says 関数は、paste 関数とほとんど同じですが、最初に、文字列 "Simon says:" が付いています。
#
# Telegrams used to be peppered with the words START and STOP in order to
# demarcate the beginning and end of sentences. Write a function below called 
# telegram that formats sentences for telegrams.
# For example the expression `telegram("Good", "morning")` should evaluate to:
# "START Good morning STOP"
# 電信には、START と STOP の二つの言葉がこの順序で配置されており、文章の最初と最後を表します。telegram 関数を定義してください。たとえば、`telegram("Good", "morning")` は、"START Good morning STOP" と出力します。

telegram <- function(...){

}

ここまでが、左上の pane に表示されている、telegram.R という名前のスクリプトの内容です。

  |=======================================================================                  |  80%
| Make sure to save your script before you type submit().

スクリプトを保存してから、submit() と入力してください。

telegram <- function(...){
  paste("START", ..., "STOP", sep = " ")
}

最後の部分を上のようにして、スクリプトを保存します。

> submit()

| Sourcing your script...

| Nice work!

それで良いですよ。

  |=========================================================================                |  82%
| Now let's test out your telegram function. Use your new telegram function passing in whatever
| arguments you wish!

あなたの telegram 関数を試してみましょう。思いつく、どのような引数でもかまいませんから、新しい、telegram 関数に渡してください。

telegram("日本語も", "大丈夫?", "ちょっと心配!")
## [1] "START 日本語も 大丈夫? ちょっと心配! STOP"
| Keep working like that and you'll get there!

その調子で良いですよ。

mad_libs.R

# Let's explore how to "unpack" arguments from an ellipses when you use the
# ellipses as an argument in a function. Below I have an example function that
# is supposed to add two explicitly named arguments called alpha and beta.
# 省略記号を関数の引数に使う場合に、引数を展開する方法を見てみましょう。下では、alpha と beta という名前を明示された引数を、加える関数の例を示しました。
# 
# add_alpha_and_beta <- function(...){
#   # First we must capture the ellipsis inside of a list
#   # and then assign the list to a variable. Let's name this
#   # variable `args`.
#   # まず、省略記号の部分をリストとして取り出し、変数に割り当てなければいけません。その変数の名前を `args` とします。
#
#   args <- list(...)
#
#   # We're now going to assume that there are two named arguments within args
#   # with the names `alpha` and `beta.` We can extract named arguments from
#   # the args list by using the name of the argument and double brackets. The
#   # `args` variable is just a regular list after all!
#   # args には、`alpha` と `beta` と二つの名前のついた引数があると仮定します。これらのなまえのついた引数を args から、名前と、二重括弧 [[]] を使って取り出します。`args` 変数は通常のリストです。
#   
#   alpha <- args[["alpha"]]
#   beta  <- args[["beta"]]
#
#   # Then we return the sum of alpha and beta.
#   # それから、alpha と beta の和を返します。
#
#   alpha + beta 
# }
#
# Have you ever played Mad Libs before? The function below will construct a
# sentence from parts of speech that you provide as arguments. We'll write most
# of the function, but you'll need to unpack the appropriate arguments from the
# ellipses.
# Mad Libs で遊んだことがありますか。下の関数は、あなたが提供する話から引数を取り出して、文章を作ります。しかし、適切な引数を適切に、展開する必要があります。

mad_libs <- function(...){
  # Do your argument unpacking here!
  # ここで引数を展開してください。

  # Don't modify any code below this comment.
  # Notice the variables you'll need to create in order for the code below to
  # be functional!
  # 下の、プログラムと、コメントは修正しないでください。下のプログラムがうまく動くように、変数を決めてください。
  paste("News from", place, "today where", adjective, "students took to the streets in protest of the new", noun, "being installed on campus.")
}

ここまでが、左上の pane に表示されている、mad_libs.R という名前のスクリプトの内容です。

  |==========================================================================               |  84%
| Make sure to save your script before you type submit().

スクリプトを保存してから、submit() と入力してください。

mad_libs <- function(...){
  # Do your argument unpacking here!
  args <- list(...)
  place <- args[["place"]]
  adjective <- args[["adjective"]]
  noun <- args[["noun"]]
  # Don't modify any code below this comment.
  # Notice the variables you'll need to create in order for the code below to
  # be functional!
  paste("News from", place, "today where", adjective, "students took to the streets in protest of the new", noun, "being installed on campus.")
}

最後の部分を上のようにして、スクリプトを保存します。

> submit()

| Sourcing your script...


| Your dedication is inspiring!

感動ものです。

|============================================================================             |  86%
| Time to use your mad_libs function. Make sure to name the place, adjective, and noun arguments
| in order for your function to work.

mad_libs 関数を使うときが来ました。name, place, adjective, noun 引数を貴方の関数に渡してください。

mad_libs(place = "SL Learning Site", adjective = "many", noun = "change of policy")
## [1] "News from SL Learning Site today where many students took to the streets in protest of the new change of policy being installed on campus."
| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |==============================================================================           |  88%
| We're coming to the end of this lesson, but there's still one more idea you should be made aware
| of.

...

この課の最後の部分に来ましたが、もう一つ覚えておくと良いことがあります。

  |================================================================================         |  90%
| You're familiar with adding, subtracting, multiplying, and dividing numbers in R. To do this you
| use the +, -, *, and / symbols. These symbols are called binary operators because they take two
| inputs, an input from the left and an input from the right.

...

_R における、数の足し算、引き算、かけ算、割り算は、よく知っていると思いますが、そのためには、+, -, *, および / という記号を使います。これらの記号は、二項演算子といいます。左と右に入力するからです。_

  |==================================================================================       |  92%
| In R you can define your own binary operators. In the next script I'll show you how.

...

R では、あなたの二項演算を定義することができます。次の、スクリプトでどのようにするかを説明します。

bin_op.R

# The syntax for creating new binary operators in R is unlike anything else in
# R, but it allows you to define a new syntax for your function. I would only
# recommend making your own binary operator if you plan on using it often!
# R の二項演算を構成する文法は、他のものとは異なりますが、新しい文法で関数を作成する方法を提供します。自分の二項演算は、それを頻繁に使う場合のみ、作成することをお薦めします。
#
# User-defined binary operators have the following syntax:
# 二項演算を新しく作成するときの、文法は次のようになります。
#      %[whatever]% 
# where [whatever] represents any valid variable name.
# ここで、[whatever] は、何らかの変数名です。
# 
# Let's say I wanted to define a binary operator that multiplied two numbers and
# then added one to the product. An implementation of that operator is below:
# ここでは、二つの数をかけて、一を加える計算をする、二項演算を定義します。それは、下のようになります。
#
# "%mult_add_one%" <- function(left, right){ # Notice the quotation marks!
#   left * right + 1
# }
#
# I could then use this binary operator like `4 %mult_add_one% 5` which would
# evaluate to 21.
# 使うときは、`4 %mult_add_one% 5` とします。結果は、21 になります。
#
# Write your own binary operator below from absolute scratch! Your binary
# operator must be called %p% so that the expression:
# 白紙から、あなたの二項演算を定義してください。二項演算子は、%p% と下のような形で、
#
#       "Good" %p% "job!"
#
# will evaluate to: "Good job!"
# 結果は、"Good job!" となるようにします。

"%p%" <- function( ){ # Remember to add arguments!

}

ここまでが、左上の pane に表示されている、mad_libs.R という名前のスクリプトの内容です。

  |====================================================================================     |  94%
| Make sure to save your script before you type submit().

スクリプトを保存してから、submit() と入力してください。

"%p%" <- function(...){ # Remember to add arguments!
  paste(...)
}

最後の部分を上のようにして、スクリプトを保存します。

> submit()

| Sourcing your script...

| Keep up the great work!

その調子です。

  |=====================================================================================    |  96%
| You made your own binary operator! Let's test it out. Paste together the strings: 'I', 'love',
| 'R!' using your new binary operator.

自分の二項演算を定義しました。試してみましょう。‘I’, ‘love’, ‘R!’ をあなたの新しい、二項演算で結合させてください。

'I' %p% 'love' %p% 'R!'
## [1] "I love R!"
| All that hard work is paying off!

努力は報われます。

  |=======================================================================================  |  98%
| We've come to the end of our lesson! Go out there and write some great functions!

...

課の最後まで来ました。素晴らしい関数を書いてみてください。

  |=========================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: No

Selection: 2

Coursera.org の終了認定が欲しいですか。

2 を選んでください。何度か説明しましたから、省略します。

| All that practice is paying off!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 0

練習のおかげですね。

課の最後までたどり着きました。メニューに戻るには、

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: `swirl` コースのリストへ.

選択:0 (SL が入力)

| Leaving swirl now. Type swirl() to resume.

swirl を終了します。再開するときは、swirl() と入力してください。

>

2.10. lapply and sapply

強力な、Apply 系の関数のうち、lapply (list apply) と sapply (simply apply) について学びます。効率的にデータ解析をするために、とても有効な関数です。
和訳や解説は、課を選択するところから始めます。よくわからない方は、1.2 または、1.1 の最初の部分を復習してください。

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files        3: Sequences of Numbers    
 4: Vectors                    5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                      9: Functions               
10: lapply and sapply         11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times           15: Base Graphics           


Selection: 10

10: lapply and sapply(第10課 lapply と sapply)を選択しました。

  |                                                                                         |   0%

| In this lesson, you'll learn how to use lapply() and sapply(), the two most important members of
| R's *apply family of functions, also known as loop functions.

...

_この課では、lapply() と sapply() の使い方について学びます。R の *apply 系の関数で、ループ関数としても知られています。_

  |==                                                                                       |   2%
| These powerful functions, along with their close relatives (vapply() and tapply(), among others)
| offer a concise and convenient means of implementing the Split-Apply-Combine strategy for data
| analysis.

...

_これらの関数は、他の同じ系列の (vapply() や tapply() など) 関数同様、データの、Split-Apply-Combine(分割して、適用し、連結する)といった、総合的戦略を簡潔かつ便利に実行するものです。__

  |====                                                                                     |   4%
| Each of the *apply functions will SPLIT up some data into smaller pieces, APPLY a function to
| each piece, then COMBINE the results. A more detailed discussion of this strategy is found in
| Hadley Wickham's Journal of Statistical Software paper titled 'The Split-Apply-Combine Strategy
| for Data Analysis'.

...

apply(最初のが変化します)関数は、データーを小さな部分に分割し、各部分に、関数を適用し、結果を連結します。詳細は、Hadley Wickham の ‘The Split-Apply-Combine Strategy for Data Analysis’, Journal of Statistical Software を参照してください。

Hadley Wickham は、R の様々な関数群である、パッケージ開発者としても有名です。ggplot2 などの、tidyverse 群の開発の中心人物です。上の論文は、The Foundation for Open Access Statistics からダウンロードできます。

  |=====                                                                                    |   6%
| Throughout this lesson, we'll use the Flags dataset from the UCI Machine Learning Repository.
| This dataset contains details of various nations and their flags. More information may be found
| here: http://archive.ics.uci.edu/ml/datasets/Flags

...

この課では、UCI Machine Learning Repository の Flags データセットを使います。このデータセットは、様々な国の国旗についての、詳細なデータが含まれています。詳細は、http://archive.ics.uci.edu/ml/datasets/Flags を参照してください。

UCI は University of California Irvine の略で、その UCI Machine Learning Repository UCI機械学習データ貯蔵庫 には、多くのデータが公開されており、そのうちのいくつかは、R の付属データ(built-in datasets)にもなっています。この flags は、swirl の R Programming のコースに付属のデータとなっていますが、R の付属データ(built-in datasets)ではありません。上のリンクから、ダウンロードすることも可能です。

  |=======                                                                                  |   8%
| Let's jump right in so you can get a feel for how these special functions work!

...

これらの関数がどんな感じなのか見てみましょう。

  |=========                                                                                |  10%
| I've stored the dataset in a variable called flags. Type head(flags) to preview the first six
| lines (i.e. the 'head') of the dataset.

flag という名前で、データセット(データの塊で基本的にデータと同じ)を保存してあります。head(flags) で最初の6行(データの頭)を見てみましょう。

> head(flags)
            name landmass zone area population language religion bars stripes colours red green
1    Afghanistan        5    1  648         16       10        2    0       3       5   1     1
2        Albania        3    1   29          3        6        6    0       0       3   1     0
3        Algeria        4    1 2388         20        8        2    2       0       3   1     1
4 American-Samoa        6    3    0          0        1        1    0       0       5   1     0
5        Andorra        3    1    0          0        6        0    3       0       3   1     0
6         Angola        4    2 1247          7       10        5    0       2       3   1     0
  blue gold white black orange mainhue circles crosses saltires quarters sunstars crescent
1    0    1     1     1      0   green       0       0        0        0        1        0
2    0    1     0     1      0     red       0       0        0        0        1        0
3    0    0     1     0      0   green       0       0        0        0        1        1
4    1    1     1     0      1    blue       0       0        0        0        0        0
5    1    1     0     0      0    gold       0       0        0        0        0        0
6    0    1     0     1      0     red       0       0        0        0        1        0
  triangle icon animate text topleft botright
1        0    1       0    0   black    green
2        0    0       1    0     red      red
3        0    0       0    0   green    white
4        1    1       1    0    blue      red
5        0    0       0    0    blue      red
6        0    1       0    0     red    black

| That's a job well done!

よくできました。

  |===========                                                                              |  12%
| You may need to scroll up to see all of the output. Now, let's check out the dimensions of the
| dataset using dim(flags).

画面をスクロールしないと、全体を見ることはできないかもしれません。dim(flags) を使って、次元を見てみましょう。

> dim(flags)
[1] 194  30

| That's a job well done!

よくできました。

  |============                                                                             |  14%
| This tells us that there are 194 rows, or observations, and 30 columns, or variables. Each
| observation is a country and each variable describes some characteristic of that country or its
| flag. To open a more complete description of the dataset in a separate text file, type
| viewinfo() when you are back at the prompt (>).

...

このデータは、194行(194のオブザベーション)、30列(30の変数)からなっています。各行は、国に対応しており、変数は、国や、国旗の特性を表しています。より詳しく情報をえたいときは、prompt (>) で、viewinfo() を使ってください。

備考:viewinfo() で、flag.names.txt ファイルの内容を見ることができます。play() で、swirl を中断して、nxt() で戻る間に確かめても良いですし、bye() で、ここまでを保存して終了して、swirl() で再開する間に確かめてもよいですね。この節のあとに、ファイルの内容を付けました。

  |==============                                                                           |  16%
| As with any dataset, we'd like to know in what format the variables have been stored. In other
| words, what is the 'class' of each variable? What happens if we do class(flags)? Try it out.

どのデータセットを扱うときも、それぞれの変数がどのような形式かを知る必要があります。それは、変数の ‘class’(種類)を知ることです。class(flags) とするとどうなりますか。

> class(flags)
[1] "data.frame"

| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |================                                                                         |  18%
| That just tells us that the entire dataset is stored as a 'data.frame', which doesn't answer our
| question. What we really need is to call the class() function on each individual column. While
| we could do this manually (i.e. one column at a time) it's much faster if we can automate the
| process. Sounds like a loop!

...

データセットが、‘data.frame’(データフレーム)だと教えてくれましたが、これが知りたかったことではありません。私たちが知りたかったのは、各列の class() でした。各列について一つずつ手作業で調べることも可能ですが、、もっと早く自動的に調べる方法があります。ループが登場しそうですね。

  |==================                                                                       |  20%
| The lapply() function takes a list as input, applies a function to each element of the list,
| then returns a list of the same length as the original one. Since a data frame is really just a
| list of vectors (you can see this with as.list(flags)), we can use lapply() to apply the class()
| function to each column of the flags dataset. Let's see it in action!

...

lapply() 関数は、リストを入力値としてもち、リストのそれぞれに関数を適応し、そのリストと同じ長さのリストを返します。as.list(flags) とすれば、見ることができますが、データフレームは、ベクトルのリストですから、lapply() 関数を使って、class()関数を各列に適用することができます。実際に見てみましょう。

  |====================                                                                     |  22%
| Type cls_list <- lapply(flags, class) to apply the class() function to each column of the flags
| dataset and store the result in a variable called cls_list. Note that you just supply the name
| of the function you want to apply (i.e. class), without the usual parentheses after it.

cls_list <- lapply(flags, class) と入力して、flag の各列に、class() 関数を適用し、結果を、cls_list に保存してください。適用する関数(今の場合は、class)の名前だけを入れ、() は含めないことを注意してください。

> cls_list <- lapply(flags, class)

| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |=====================                                                                    |  24%
| Type cls_list to view the result.

cls と入力して、結果を見てください。

> cls_list
$name
[1] "factor"

$landmass
[1] "integer"

$zone
[1] "integer"

$area
[1] "integer"

$population
[1] "integer"

$language
[1] "integer"

$religion
[1] "integer"

$bars
[1] "integer"

$stripes
[1] "integer"

$colours
[1] "integer"

$red
[1] "integer"

$green
[1] "integer"

$blue
[1] "integer"

$gold
[1] "integer"

$white
[1] "integer"

$black
[1] "integer"

$orange
[1] "integer"

$mainhue
[1] "factor"

$circles
[1] "integer"

$crosses
[1] "integer"

$saltires
[1] "integer"

$quarters
[1] "integer"

$sunstars
[1] "integer"

$crescent
[1] "integer"

$triangle
[1] "integer"

$icon
[1] "integer"

$animate
[1] "integer"

$text
[1] "integer"

$topleft
[1] "factor"

$botright
[1] "factor"


| All that hard work is paying off!

努力は報われます。

  |=======================                                                                  |  26%
| The 'l' in 'lapply' stands for 'list'. Type class(cls_list) to confirm that lapply() returned a
| list.

‘lapply’ の ‘l’ は ‘list’ を意味します。class(cls_list) と入力して、lapply() 関数は、リストを出力することを確かめてください。

> class(cls_list)
[1] "list"

| That's the answer I was looking for.

そうです。それが、期待したものです。

  |=========================                                                                |  28%
| As expected, we got a list of length 30 -- one element for each variable/column. The output
| would be considerably more compact if we could represent it as a vector instead of a list.

...

期待したように、長さ 30 のリストを得ました。それぞれは、各変数(列)に対応しています。出力は、もう少し、簡潔に、リストではなく、ベクトルにできないでしょうか。

  |===========================                                                              |  30%
| You may remember from a previous lesson that lists are most helpful for storing multiple classes
| of data. In this case, since every element of the list returned by lapply() is a character
| vector of length one (i.e. "integer" and "vector"), cls_list can be simplified to a character
| vector. To do this manually, type as.character(cls_list).

前の課で、いくつかの class(種類)のデータを保存するには、リストが適していることを学んだことを覚えている人もいるでしょう。今回は、lapply() 関数が、出力されたリストは、長さ1のベクトル(整数と、ベクトル)でした。cls_list は、文字ベクトルに変換して簡潔にすることができます。これを手作業でするには、as.character(cls_list) と入力します。

> as.character(cls_list)
 [1] "factor"  "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer"
[10] "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "factor" 
[19] "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer"
[28] "integer" "factor"  "factor" 

| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |============================                                                             |  32%
| sapply() allows you to automate this process by calling lapply() behind the scenes, but then
| attempting to simplify (hence the 's' in 'sapply') the result for you. Use sapply() the same way
| you used lapply() to get the class of each column of the flags dataset and store the result in
| cls_vect. If you need help, type ?sapply to bring up the documentation.

sapply() 関数は、この作業を自動化します。背後で、lapply() を呼び出し、結果を単純化します。(‘sapply’ の ‘s’ は simplify(単純化)からとったものです。)lapply() を使ったのと同じように、sapply() を使って、flags データセットの各列の class を調べて、cls_vect に保存してください。Help が必要なときは、?sapply として、説明を読んでください。

> cls_vect <- sapply(flags, class)

| You got it!

その通りです。

  |==============================                                                           |  34%
| Use class(cls_vect) to confirm that sapply() simplified the result to a character vector.

class(cls_vect) を使って、sapply() は結果を、簡潔な文字列ベクトルに変えたことを確認してください。

> class(cls_vect)
[1] "character"

| You are doing so well!

とてもうまくいっていますよ。

  |================================                                                         |  36%
| In general, if the result is a list where every element is of length one, then sapply() returns
| a vector. If the result is a list where every element is a vector of the same length (> 1),
| sapply() returns a matrix. If sapply() can't figure things out, then it just returns a list, no
| different from what lapply() would give you.

...

一般的に、もし、得られた結果のリストの各要素が長さ1なら、sapply() は、ベクトルにして返します。もし、リストが、同じ長さ(>1)のベクトルなら、sapply() は、行列にして返します。もし、sapply() がこれらに対応できないときは、そのままリストを返します。この場合は、lapply(0 の場合と同じ結果となります。

  |==================================                                                       |  38%
| Let's practice using lapply() and sapply() some more!

...

lapply() と sapply() についてさらに、練習しましょう。

  |====================================                                                     |  40%
| Columns 11 through 17 of our dataset are indicator variables, each representing a different
| color. The value of the indicator variable is 1 if the color is present in a country's flag and
| 0 otherwise.

...

データセットの11列から17列は、それぞれ異なる色に対応しています。変数は、その色が国旗に使われていれば 1、使われていなければ 0 を返します。

  |=====================================                                                    |  42%
| Therefore, if we want to know the total number of countries (in our dataset) with, for example,
| the color orange on their flag, we can just add up all of the 1s and 0s in the 'orange' column.
| Try sum(flags$orange) to see this.

従って、もし(データセット)の中で、いくつの国でその色が使われているか知りたければ、たとえば、オレンジ色が国旗に含まれているかを知りたければ、オレンジの列の、0 と 1 の総和を計算すればよいことがわかります。sum(flags$orange) を試してみましょう。

> sum(flags$orange)
[1] 26

| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |=======================================                                                  |  44%
| Now we want to repeat this operation for each of the colors recorded in the dataset.

...

他の色についても、同様な

  |=========================================                                                |  46%
| First, use flag_colors <- flags[, 11:17] to extract the columns containing the color data and
| store them in a new data frame called flag_colors. (Note the comma before 11:17. This subsetting
| command tells R that we want all rows, but only columns 11 through 17.)

まず、flag_colors <- flags[, 11:17] として、色のデータが入っている列を抜き出し、それを新しいデータフレームとして、flag_colors に保存してください。(11:17 の前のカンマに注意してください。これは、R に、すべての行、ただし、列は、11 列から17列までを抽出します。)

> flag_colors <- flags[, 11:17]

| Excellent job!

素晴らしい。

  |===========================================                                              |  48%
| Use the head() function to look at the first 6 lines of flag_colors.

head() 関数を用いて、flag_colors の最初の 6 行を見てください。

> head(flag_colors)
  red green blue gold white black orange
1   1     1    0    1     1     1      0
2   1     0    0    1     0     1      0
3   1     1    0    0     1     0      0
4   1     0    1    1     1     0      1
5   1     0    1    1     0     0      0
6   1     0    0    1     0     1      0

| Nice work!

それで良いですよ。

  |============================================                                             |  50%
| To get a list containing the sum of each column of flag_colors, call the lapply() function with
| two arguments. The first argument is the object over which we are looping (i.e. flag_colors) and
| the second argument is the name of the function we wish to apply to each column (i.e. sum).
| Remember that the second argument is just the name of the function with no parentheses, etc.

flag_colors の各列の和からなるリストを、二個の引数(arguments)の lapply() を使って求めてください。最初の引数は、ループするオブジェクト(ここでは、flag_colors)で、二つ目の引数は、各列に適応する関数の名前(ここでは、sum)です。二つ目の引数は、関数の名前で、() は不要です。

> lapply(flag_colors, sum)
$red
[1] 153

$green
[1] 91

$blue
[1] 99

$gold
[1] 91

$white
[1] 146

$black
[1] 52

$orange
[1] 26


| That's a job well done!

よくできました。

  |==============================================                                           |  52%
| This tells us that of the 194 flags in our dataset, 153 contain the color red, 91 contain green,
| 99 contain blue, and so on.

...

これによって、データセットの 194 の旗は、153 は赤を、91 は緑を、99 は青を含んでいることなどが、わかります。

  |================================================                                         |  54%
| The result is a list, since lapply() always returns a list. Each element of this list is of
| length one, so the result can be simplified to a vector by calling sapply() instead of lapply().
| Try it now.

lapply() は常に、リストを返しますから、結果はリストになります。リストの各成分は長さ1ですから、簡潔にベクトルにすることができます。そこで、lapply() の代わりに、sapply() を使います。試してください。

> sapply(flag_colors, sum)
   red  green   blue   gold  white  black orange 
   153     91     99     91    146     52     26 

| Excellent work!

素晴らしい。

  |==================================================                                       |  56%
| Perhaps it's more informative to find the proportion of flags (out of 194) containing each
| color. Since each column is just a bunch of 1s and 0s, the arithmetic mean of each column will
| give us the proportion of 1s. (If it's not clear why, think of a simpler situation where you
| have three 1s and two 0s -- (1 + 1 + 1 + 0 + 0)/5 = 3/5 = 0.6).

...

よりよい情報は、(194の)旗のうちどのぐらいの割合がその色を含んでいるかを見てみることでしょう。各列は、たくさんの 1 とたくさんの 0 からなっています。そこで、算術平均をとることで、1 の割合を得ることができます。(なぜか明らかではないときは、単純な場合、1が3個と、0が2個の場合を考えてください。(1 + 1 + 1 + 0 + 0)/5 = 3/5 = 0.6)となります。

  |====================================================                                     |  58%
| Use sapply() to apply the mean() function to each column of flag_colors. Remember that the
| second argument to sapply() should just specify the name of the function (i.e. mean) that you
| want to apply.

sapply() を使って、mean() 関数を、各列に適用してください。sapply() の二つ目の引数は、関数の名前(今の場合は mean)です。

> sapply(flag_colors, mean)
      red     green      blue      gold     white     black    orange 
0.7886598 0.4690722 0.5103093 0.4690722 0.7525773 0.2680412 0.1340206 

| Keep working like that and you'll get there!

その調子で良いですよ。

  |=====================================================                                    |  60%
| In the examples we've looked at so far, sapply() has been able to simplify the result to vector.
| That's because each element of the list returned by lapply() was a vector of length one. Recall
| that sapply() instead returns a matrix when each element of the list returned by lapply() is a
| vector of the same length (> 1).

...

これまで見てきた例では、sapply() 結果を簡単しにして、ベクトルとして返すことができました。それは、lapply() で返すものが、長さ 1 のベクトルだからです。sapply() は、lapply() が返す値が、同じ長さ(>1)のベクトルのときは、行列として返します。

  |=======================================================                                  |  62%
| To illustrate this, let's extract columns 19 through 23 from the flags dataset and store the
| result in a new data frame called flag_shapes. flag_shapes <- flags[, 19:23] will do it.

これを見るために、国旗のデータセットの19列目から、23列目を抜き出して、flag_shapes という名前のデータセットとして保存してください。flag_shapes <- flags[, 19:23] とします。

> flag_shapes <- flags[, 19:23]

| You are amazing!

驚きです。

  |=========================================================                                |  64%
| Each of these columns (i.e. variables) represents the number of times a particular shape or
| design appears on a country's flag. We are interested in the minimum and maximum number of times
| each shape or design appears.

...

これらの列(変数)は、特定の形や、デザインがそれぞれの国旗に何回現れるかを表しています。各形やデザインが、何回あらわれるか、その最小と最大の数を知りたいとしましょう。

  |===========================================================                              |  66%
| The range() function returns the minimum and maximum of its first argument, which should be a
| numeric vector. Use lapply() to apply the range function to each column of flag_shapes. Don't
| worry about storing the result in a new variable. By now, we know that lapply() always returns a
| list.

最小と最大の数は、range() で求めることができます。第一の引数は、数ベクトルです。lapply() で、range 関数を flag_shapes の各列に適用してください。結果を新しい変数に割り当てる必要はありません。lapply() はリストを結果として返すことは、知っていますね。

> lapply(flag_shapes, range)
$circles
[1] 0 4

$crosses
[1] 0 2

$saltires
[1] 0 1

$quarters
[1] 0 4

$sunstars
[1]  0 50


| All that practice is paying off!

練習のおかげですね。

  |=============================================================                            |  68%
| Do the same operation, but using sapply() and store the result in a variable called shape_mat.

同じ作業を、sapply() を使って実行し、結果を shape_mat と置いてください。

> shape_mat <- sapply(flag_shapes, range)

| You are amazing!

驚きです。

  |==============================================================                           |  70%
| View the contents of shape_mat.

shape_mat の内容を見てください。

> shape_mat
     circles crosses saltires quarters sunstars
[1,]       0       0        0        0        0
[2,]       4       2        1        4       50

| That's the answer I was looking for.

そうです。それが、期待したものです。

  |================================================================                         |  72%
| Each column of shape_mat gives the minimum (row 1) and maximum (row 2) number of times its
| respective shape appears in different flags.

...

shape_mat の各列は、異なる国旗の中に、それぞれの形が何回現れるか、(第一行に)その最小数と、(第二行に)最大数を表しています。

  |==================================================================                       |  74%
| Use the class() function to confirm that shape_mat is a matrix.

class() 関数を使って、shape_mat が行列であることを確認してください。

> class(shape_mat)
[1] "matrix"

| All that hard work is paying off!

努力は報われます。

  |====================================================================                     |  76%
| As we've seen, sapply() always attempts to simplify the result given by lapply(). It has been
| successful in doing so for each of the examples we've looked at so far. Let's look at an example
| where sapply() can't figure out how to simplify the result and thus returns a list, no different
| from lapply().

...

これまで見てきたように、sapply() 関数は、lapply() によって得られた結果を簡単にしようとします。これまでに見てきた例では、そのことに成功してきました。sapply() が結果を簡単にすることができなず、リストを返すことになる例を見てみましょう。この場合は、結果は、lapply() と同じになります。

  |=====================================================================                    |  78%
| When given a vector, the unique() function returns a vector with all duplicate elements removed.
| In other words, unique() returns a vector of only the 'unique' elements. To see how it works,
| try unique(c(3, 4, 5, 5, 5, 6, 6)).

ベクトルに対して、unique() 関数は、重複したものを除いたベクトルを返します。他の言葉で言うと、unique() は、‘unique’(唯一の)成分だけからなるベクトルを返します。このことを、unique(c(3, 4, 5, 5, 5, 6, 6)) を使って見てみましょう。

> unique(c(3, 4, 5, 5, 5, 6, 6))
[1] 3 4 5 6

| Excellent job!

素晴らしい。

  |=======================================================================                  |  80%
| We want to know the unique values for each variable in the flags dataset. To accomplish this,
| use lapply() to apply the unique() function to each column in the flags dataset, storing the
| result in a variable called unique_vals.

flags データセットにおいて、各変数について、unique values(相異なる値)を調べてみましょう。このため、lapply() を使って、unique() を flags データセットの各列に適用します。結果を、unique_vals に保存してください。

> unique_vals <- lapply(flags, unique)

| Keep up the great work!

その調子です。

  |=========================================================================                |  82%
| Print the value of unique_vals to the console.

unique_vals をコンソールに表示してください。

> unique_vals
$name
  [1] Afghanistan              Albania                  Algeria                 
  [4] American-Samoa           Andorra                  Angola                  
  [7] Anguilla                 Antigua-Barbuda          Argentina               
 [10] Argentine                Australia                Austria                 
 [13] Bahamas                  Bahrain                  Bangladesh              
 [16] Barbados                 Belgium                  Belize                  
 [19] Benin                    Bermuda                  Bhutan                  
 [22] Bolivia                  Botswana                 Brazil                  
 [25] British-Virgin-Isles     Brunei                   Bulgaria                
 [28] Burkina                  Burma                    Burundi                 
 [31] Cameroon                 Canada                   Cape-Verde-Islands      
 [34] Cayman-Islands           Central-African-Republic Chad                    
 [37] Chile                    China                    Colombia                
 [40] Comorro-Islands          Congo                    Cook-Islands            
 [43] Costa-Rica               Cuba                     Cyprus                  
 [46] Czechoslovakia           Denmark                  Djibouti                
 [49] Dominica                 Dominican-Republic       Ecuador                 
 [52] Egypt                    El-Salvador              Equatorial-Guinea       
 [55] Ethiopia                 Faeroes                  Falklands-Malvinas      
 [58] Fiji                     Finland                  France                  
 [61] French-Guiana            French-Polynesia         Gabon                   
 [64] Gambia                   Germany-DDR              Germany-FRG             
 [67] Ghana                    Gibraltar                Greece                  
 [70] Greenland                Grenada                  Guam                    
 [73] Guatemala                Guinea                   Guinea-Bissau           
 [76] Guyana                   Haiti                    Honduras                
 [79] Hong-Kong                Hungary                  Iceland                 
 [82] India                    Indonesia                Iran                    
 [85] Iraq                     Ireland                  Israel                  
 [88] Italy                    Ivory-Coast              Jamaica                 
 [91] Japan                    Jordan                   Kampuchea               
 [94] Kenya                    Kiribati                 Kuwait                  
 [97] Laos                     Lebanon                  Lesotho                 
[100] Liberia                  Libya                    Liechtenstein           
[103] Luxembourg               Malagasy                 Malawi                  
[106] Malaysia                 Maldive-Islands          Mali                    
[109] Malta                    Marianas                 Mauritania              
[112] Mauritius                Mexico                   Micronesia              
[115] Monaco                   Mongolia                 Montserrat              
[118] Morocco                  Mozambique               Nauru                   
[121] Nepal                    Netherlands              Netherlands-Antilles    
[124] New-Zealand              Nicaragua                Niger                   
[127] Nigeria                  Niue                     North-Korea             
[130] North-Yemen              Norway                   Oman                    
[133] Pakistan                 Panama                   Papua-New-Guinea        
[136] Parguay                  Peru                     Philippines             
[139] Poland                   Portugal                 Puerto-Rico             
[142] Qatar                    Romania                  Rwanda                  
[145] San-Marino               Sao-Tome                 Saudi-Arabia            
[148] Senegal                  Seychelles               Sierra-Leone            
[151] Singapore                Soloman-Islands          Somalia                 
[154] South-Africa             South-Korea              South-Yemen             
[157] Spain                    Sri-Lanka                St-Helena               
[160] St-Kitts-Nevis           St-Lucia                 St-Vincent              
[163] Sudan                    Surinam                  Swaziland               
[166] Sweden                   Switzerland              Syria                   
[169] Taiwan                   Tanzania                 Thailand                
[172] Togo                     Tonga                    Trinidad-Tobago         
[175] Tunisia                  Turkey                   Turks-Cocos-Islands     
[178] Tuvalu                   UAE                      Uganda                  
[181] UK                       Uruguay                  US-Virgin-Isles         
[184] USA                      USSR                     Vanuatu                 
[187] Vatican-City             Venezuela                Vietnam                 
[190] Western-Samoa            Yugoslavia               Zaire                   
[193] Zambia                   Zimbabwe                
194 Levels: Afghanistan Albania Algeria American-Samoa Andorra Angola Anguilla ... Zimbabwe

$landmass
[1] 5 3 4 6 1 2

$zone
[1] 1 3 2 4

$area
  [1]   648    29  2388     0  1247  2777  7690    84    19     1   143    31    23   113    47
 [16]  1099   600  8512     6   111   274   678    28   474  9976     4   623  1284   757  9561
 [31]  1139     2   342    51   115     9   128    43    22    49   284  1001    21  1222    12
 [46]    18   337   547    91   268    10   108   249   239   132  2176   109   246    36   215
 [61]   112    93   103  3268  1904  1648   435    70   301   323    11   372    98   181   583
 [76]   236    30  1760     3   587   118   333  1240  1031  1973  1566   447   783   140    41
 [91]  1267   925   121   195   324   212   804    76   463   407  1285   300   313    92   237
[106]    26  2150   196    72   637  1221    99   288   505    66  2506    63    17   450   185
[121]   945   514    57     5   164   781   245   178  9363 22402    15   912   256   905   753
[136]   391

$population
 [1]   16    3   20    0    7   28   15    8   90   10    1    6  119    9   35    4   24    2   11
[20] 1008    5   47   31   54   17   61   14  684  157   39   57  118   13   77   12   56   18   84
[39]   48   36   22   29   38   49   45  231  274   60

$language
 [1] 10  6  8  1  2  4  3  5  7  9

$religion
[1] 2 6 1 0 5 3 4 7

$bars
[1] 0 2 3 1 5

$stripes
 [1]  3  0  2  1  5  9 11 14  4  6 13  7

$colours
[1] 5 3 2 8 6 4 7 1

$red
[1] 1 0

$green
[1] 1 0

$blue
[1] 0 1

$gold
[1] 1 0

$white
[1] 1 0

$black
[1] 1 0

$orange
[1] 0 1

$mainhue
[1] green  red    blue   gold   white  orange black  brown 
Levels: black blue brown gold green orange red white

$circles
[1] 0 1 4 2

$crosses
[1] 0 1 2

$saltires
[1] 0 1

$quarters
[1] 0 1 4

$sunstars
 [1]  1  0  6 22 14  3  4  5 15 10  7  2  9 50

$crescent
[1] 0 1

$triangle
[1] 0 1

$icon
[1] 1 0

$animate
[1] 0 1

$text
[1] 0 1

$topleft
[1] black  red    green  blue   white  orange gold  
Levels: black blue gold green orange red white

$botright
[1] green  red    white  black  blue   gold   orange brown 
Levels: black blue brown gold green orange red white


| Great job!

やりましたね。

  |===========================================================================              |  84%
| Since unique_vals is a list, you can use what you've learned to determine the length of each
| element of unique_vals (i.e. the number of unique values for each variable). Simplify the
| result, if possible. Hint: Apply the length() function to each element of unique_vals.

unique_vals はリストですから、unique_vals のそれぞれの成分の長さを調べることができますね。(各変数の相異なる値の数です。)可能なら、簡単にしてみましょう。ヒント:unique_vals の各正文に、length() を適用してください。

> sapply(unique_vals, length)
      name   landmass       zone       area population   language   religion       bars    stripes 
       194          6          4        136         48         10          8          5         12 
   colours        red      green       blue       gold      white      black     orange    mainhue 
         8          2          2          2          2          2          2          2          8 
   circles    crosses   saltires   quarters   sunstars   crescent   triangle       icon    animate 
         4          3          2          3         14          2          2          2          2 
      text    topleft   botright 
         2          7          8 

| You nailed it! Good job!

見事です。

  |=============================================================================            |  86%
| The fact that the elements of the unique_vals list are all vectors of *different* length poses a
| problem for sapply(), since there's no obvious way of simplifying the result.

...

unique_vals リストの成分は、長さが異なるベクトルであることがわかりました。だから、sapply() によって、結果を明らかな方法では、簡単にすることができなかったのです。

  |==============================================================================           |  88%
| Use sapply() to apply the unique() function to each column of the flags dataset to see that you
| get the same unsimplified list that you got from lapply().

flags データセットに、sapply() 関数をもちい、unique() 関数を各列に適用して、lapply() で得たのと同じ、簡単になっていないリストを返すことを確認してみてください。

> sapply(flags, unique)
$name
  [1] Afghanistan              Albania                  Algeria                 
  [4] American-Samoa           Andorra                  Angola                  
  [7] Anguilla                 Antigua-Barbuda          Argentina               
 [10] Argentine                Australia                Austria                 
 [13] Bahamas                  Bahrain                  Bangladesh              
 [16] Barbados                 Belgium                  Belize                  
 [19] Benin                    Bermuda                  Bhutan                  
 [22] Bolivia                  Botswana                 Brazil                  
 [25] British-Virgin-Isles     Brunei                   Bulgaria                
 [28] Burkina                  Burma                    Burundi                 
 [31] Cameroon                 Canada                   Cape-Verde-Islands      
 [34] Cayman-Islands           Central-African-Republic Chad                    
 [37] Chile                    China                    Colombia                
 [40] Comorro-Islands          Congo                    Cook-Islands            
 [43] Costa-Rica               Cuba                     Cyprus                  
 [46] Czechoslovakia           Denmark                  Djibouti                
 [49] Dominica                 Dominican-Republic       Ecuador                 
 [52] Egypt                    El-Salvador              Equatorial-Guinea       
 [55] Ethiopia                 Faeroes                  Falklands-Malvinas      
 [58] Fiji                     Finland                  France                  
 [61] French-Guiana            French-Polynesia         Gabon                   
 [64] Gambia                   Germany-DDR              Germany-FRG             
 [67] Ghana                    Gibraltar                Greece                  
 [70] Greenland                Grenada                  Guam                    
 [73] Guatemala                Guinea                   Guinea-Bissau           
 [76] Guyana                   Haiti                    Honduras                
 [79] Hong-Kong                Hungary                  Iceland                 
 [82] India                    Indonesia                Iran                    
 [85] Iraq                     Ireland                  Israel                  
 [88] Italy                    Ivory-Coast              Jamaica                 
 [91] Japan                    Jordan                   Kampuchea               
 [94] Kenya                    Kiribati                 Kuwait                  
 [97] Laos                     Lebanon                  Lesotho                 
[100] Liberia                  Libya                    Liechtenstein           
[103] Luxembourg               Malagasy                 Malawi                  
[106] Malaysia                 Maldive-Islands          Mali                    
[109] Malta                    Marianas                 Mauritania              
[112] Mauritius                Mexico                   Micronesia              
[115] Monaco                   Mongolia                 Montserrat              
[118] Morocco                  Mozambique               Nauru                   
[121] Nepal                    Netherlands              Netherlands-Antilles    
[124] New-Zealand              Nicaragua                Niger                   
[127] Nigeria                  Niue                     North-Korea             
[130] North-Yemen              Norway                   Oman                    
[133] Pakistan                 Panama                   Papua-New-Guinea        
[136] Parguay                  Peru                     Philippines             
[139] Poland                   Portugal                 Puerto-Rico             
[142] Qatar                    Romania                  Rwanda                  
[145] San-Marino               Sao-Tome                 Saudi-Arabia            
[148] Senegal                  Seychelles               Sierra-Leone            
[151] Singapore                Soloman-Islands          Somalia                 
[154] South-Africa             South-Korea              South-Yemen             
[157] Spain                    Sri-Lanka                St-Helena               
[160] St-Kitts-Nevis           St-Lucia                 St-Vincent              
[163] Sudan                    Surinam                  Swaziland               
[166] Sweden                   Switzerland              Syria                   
[169] Taiwan                   Tanzania                 Thailand                
[172] Togo                     Tonga                    Trinidad-Tobago         
[175] Tunisia                  Turkey                   Turks-Cocos-Islands     
[178] Tuvalu                   UAE                      Uganda                  
[181] UK                       Uruguay                  US-Virgin-Isles         
[184] USA                      USSR                     Vanuatu                 
[187] Vatican-City             Venezuela                Vietnam                 
[190] Western-Samoa            Yugoslavia               Zaire                   
[193] Zambia                   Zimbabwe                
194 Levels: Afghanistan Albania Algeria American-Samoa Andorra Angola Anguilla ... Zimbabwe

$landmass
[1] 5 3 4 6 1 2

$zone
[1] 1 3 2 4

$area
  [1]   648    29  2388     0  1247  2777  7690    84    19     1   143    31    23   113    47
 [16]  1099   600  8512     6   111   274   678    28   474  9976     4   623  1284   757  9561
 [31]  1139     2   342    51   115     9   128    43    22    49   284  1001    21  1222    12
 [46]    18   337   547    91   268    10   108   249   239   132  2176   109   246    36   215
 [61]   112    93   103  3268  1904  1648   435    70   301   323    11   372    98   181   583
 [76]   236    30  1760     3   587   118   333  1240  1031  1973  1566   447   783   140    41
 [91]  1267   925   121   195   324   212   804    76   463   407  1285   300   313    92   237
[106]    26  2150   196    72   637  1221    99   288   505    66  2506    63    17   450   185
[121]   945   514    57     5   164   781   245   178  9363 22402    15   912   256   905   753
[136]   391

$population
 [1]   16    3   20    0    7   28   15    8   90   10    1    6  119    9   35    4   24    2   11
[20] 1008    5   47   31   54   17   61   14  684  157   39   57  118   13   77   12   56   18   84
[39]   48   36   22   29   38   49   45  231  274   60

$language
 [1] 10  6  8  1  2  4  3  5  7  9

$religion
[1] 2 6 1 0 5 3 4 7

$bars
[1] 0 2 3 1 5

$stripes
 [1]  3  0  2  1  5  9 11 14  4  6 13  7

$colours
[1] 5 3 2 8 6 4 7 1

$red
[1] 1 0

$green
[1] 1 0

$blue
[1] 0 1

$gold
[1] 1 0

$white
[1] 1 0

$black
[1] 1 0

$orange
[1] 0 1

$mainhue
[1] green  red    blue   gold   white  orange black  brown 
Levels: black blue brown gold green orange red white

$circles
[1] 0 1 4 2

$crosses
[1] 0 1 2

$saltires
[1] 0 1

$quarters
[1] 0 1 4

$sunstars
 [1]  1  0  6 22 14  3  4  5 15 10  7  2  9 50

$crescent
[1] 0 1

$triangle
[1] 0 1

$icon
[1] 1 0

$animate
[1] 0 1

$text
[1] 0 1

$topleft
[1] black  red    green  blue   white  orange gold  
Levels: black blue gold green orange red white

$botright
[1] green  red    white  black  blue   gold   orange brown 
Levels: black blue brown gold green orange red white


| Excellent work!

素晴らしい。

  |================================================================================         |  90%
| Occasionally, you may need to apply a function that is not yet defined, thus requiring you to
| write your own. Writing functions in R is beyond the scope of this lesson, but let's look at a
| quick example of how you might do so in the context of loop functions.

...

しばしば、定義されていない関数を、そのばでプログラムして、適用する必要があることが生じます。R の関数をプログラムすることは、このコースの範囲を超えていますが、loop 関数という枠組みの中で、簡単に見てみましょう。

  |==================================================================================       |  92%
| Pretend you are interested in only the second item from each element of the unique_vals list
| that you just created. Since each element of the unique_vals list is a vector and we're not
| aware of any built-in function in R that returns the second element of a vector, we will
| construct our own function.

...

unique_vals のそれぞれの二番目の項目だけが、必要だということを想像してみましょう。unique_vals リストの要素は、ベクトルで、R の関数で、その二番目の成分を取り出すものは、知らないとしましょう。そこで、自分でそのような関数をプログラムします。

  |====================================================================================     |  94%
| lapply(unique_vals, function(elem) elem[2]) will return a list containing the second item from
| each element of the unique_vals list. Note that our function takes one argument, elem, which is
| just a 'dummy variable' that takes on the value of each element of unique_vals, in turn.

lapply(unique_vals, function(elem) elem[2]) が unique_vals の各成分の二番目の値を返してくれいます。ここで書いた関数は、引数は、elem という名前の一つで、それは、‘dummy variable’(仮の変数)で、unique_vals の各成分を値とします。

> lapply(unique_vals, function(elem) elem[2])
$name
[1] Albania
194 Levels: Afghanistan Albania Algeria American-Samoa Andorra Angola Anguilla ... Zimbabwe

$landmass
[1] 3

$zone
[1] 3

$area
[1] 29

$population
[1] 3

$language
[1] 6

$religion
[1] 6

$bars
[1] 2

$stripes
[1] 0

$colours
[1] 3

$red
[1] 0

$green
[1] 0

$blue
[1] 1

$gold
[1] 0

$white
[1] 0

$black
[1] 0

$orange
[1] 1

$mainhue
[1] red
Levels: black blue brown gold green orange red white

$circles
[1] 1

$crosses
[1] 1

$saltires
[1] 1

$quarters
[1] 1

$sunstars
[1] 0

$crescent
[1] 1

$triangle
[1] 1

$icon
[1] 0

$animate
[1] 1

$text
[1] 1

$topleft
[1] red
Levels: black blue gold green orange red white

$botright
[1] red
Levels: black blue brown gold green orange red white


| All that hard work is paying off!

努力は報われます。

  |=====================================================================================    |  96%
| The only difference between previous examples and this one is that we are defining and using our
| own function right in the call to lapply(). Our function has no name and disappears as soon as
| lapply() is done using it. So-called 'anonymous functions' can be very useful when one of R's
| built-in functions isn't an option.

...

前の例との違いは、この例では、自分の関数を定義して、それを lapply() に適用したことです。この関数は、名前もなく、lapply() が使ったあとには、消えてしまいます。このように、R に付属していない関数を使うときには、‘anonymous functions’(匿名関数)は有効です。

  |=======================================================================================  |  98%
| In this lesson, you learned how to use the powerful lapply() and sapply() functions to apply an
| operation over the elements of a list. In the next lesson, we'll take a look at some close
| relatives of lapply() and sapply().

...

この課では、強力な lapply() と sapply() という二つの関数を、リストのそれぞれに適用する方法について、学びました。次の課では、lapply() と sapply() の親戚について、さらに学びます。

  |=========================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

Coursera.org の終了認定が欲しいですか。

1 を選んでください。何度か説明しましたから、省略します。

| All that practice is paying off!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 0

練習のおかげですね。

課の最後までたどり着きました。メニューに戻るには、

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: `swirl` コースのリストへ.

選択:0 (SL が入力)

| Leaving swirl now. Type swirl() to resume.

swirl を終了します。再開するときは、swirl() と入力してください。

>

FLAGs Dataset

2.10 で利用した、FLAGs データセットの基本情報は viewinfo() 関数で見ることができます。出力は次のようになります。これを見ると、このデータセットはどのようなもので、それぞれの変数(列)には、どのような情報が記録されているかの概要がわかります。多くのデータセットには、このような情報がありますから、利用の前に、確認することが大切です。

flag.names.txt

1. Title: Flag database

2. Source Information
   -- Creators: Collected primarily from the "Collins Gem Guide to Flags":
      Collins Publishers (1986).
   -- Donor: Richard S. Forsyth 
             8 Grosvenor Avenue
             Mapperley Park
             Nottingham NG3 5DX
             0602-621676
   -- Date: 5/15/1990

3. Past Usage:
   -- None known other than what is shown in Forsyth's PC/BEAGLE User's Guide.

4. Relevant Information:
   -- This data file contains details of various nations and their flags.
      In this file the fields are separated by spaces (not commas).  With
      this data you can try things like predicting the religion of a country
      from its size and the colours in its flag.  
   -- 10 attributes are numeric-valued.  The remainder are either Boolean-
      or nominal-valued.

5. Number of Instances: 194

6. Number of attributes: 30 (overall)

7. Attribute Information:
   1. name  Name of the country concerned
   2. landmass  1=N.America, 2=S.America, 3=Europe, 4=Africa, 4=Asia, 6=Oceania
   3. zone  Geographic quadrant, based on Greenwich and the Equator
                1=NE, 2=SE, 3=SW, 4=NW
   4. area  in thousands of square km
   5. population    in round millions
   6. language 1=English, 2=Spanish, 3=French, 4=German, 5=Slavic, 6=Other 
               Indo-European, 7=Chinese, 8=Arabic, 
               9=Japanese/Turkish/Finnish/Magyar, 10=Others
   7. religion 0=Catholic, 1=Other Christian, 2=Muslim, 3=Buddhist, 4=Hindu,
               5=Ethnic, 6=Marxist, 7=Others
   8. bars     Number of vertical bars in the flag
   9. stripes  Number of horizontal stripes in the flag
  10. colours  Number of different colours in the flag
  11. red      0 if red absent, 1 if red present in the flag
  12. green    same for green
  13. blue     same for blue
  14. gold     same for gold (also yellow)
  15. white    same for white
  16. black    same for black
  17. orange   same for orange (also brown)
  18. mainhue  predominant colour in the flag (tie-breaks decided by taking
               the topmost hue, if that fails then the most central hue,
               and if that fails the leftmost hue)
  19. circles  Number of circles in the flag
  20. crosses  Number of (upright) crosses
  21. saltires Number of diagonal crosses
  22. quarters Number of quartered sections
  23. sunstars Number of sun or star symbols
  24. crescent 1 if a crescent moon symbol present, else 0
  25. triangle 1 if any triangles present, 0 otherwise
  26. icon     1 if an inanimate image present (e.g., a boat), otherwise 0
  27. animate  1 if an animate image (e.g., an eagle, a tree, a human hand)
               present, 0 otherwise
  28. text     1 if any letters or writing on the flag (e.g., a motto or
               slogan), 0 otherwise
  29. topleft  colour in the top-left corner (moving right to decide 
               tie-breaks)
  30. botright Colour in the bottom-left corner (moving left to decide 
               tie-breaks)

8. Missing values: None

2.11. vapply and tapply

前の課で学んだ、lapply と sapply の親戚である、vapply tapply について学びます。
和訳や解説は、課を選択するところから始めます。よくわからない方は、1.2 または、1.1 の最初の部分を復習してください。

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files        3: Sequences of Numbers    
 4: Vectors                    5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                      9: Functions               
10: lapply and sapply         11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times           15: Base Graphics           


Selection: 11

11: vapply and tapply(第11課 vapply and tapply)を選択しました。

  |                                                                                         |   0%

| In the last lesson, you learned about the two most fundamental members of R's *apply family of
| functions: lapply() and sapply(). Both take a list as input, apply a function to each element of
| the list, then combine and return the result. lapply() always returns a list, whereas sapply()
| attempts to simplify the result.

...

_前の課では、R の *apply 系列の二つの基本的な関数である、lapply() と sapply() について学びました。どちらも、list を入力し、リストのそれぞれの要素に関数を適用します。そして、結果を連結させて出力します。lapply() は常に、リストを返しますが、sapply() は、結果を簡単にすることを試みます。_

  |====                                                                                     |   4%
| In this lesson, you'll learn how to use vapply() and tapply(), each of which serves a very
| specific purpose within the Split-Apply-Combine methodology. For consistency, we'll use the same
| dataset we used in the 'lapply and sapply' lesson.

...

この課では、vapply() と tapply() について学びます。どちらも、Split-Apply-Combine(分解して、適用して、連結させる)手法を実行します。続きであることを意識して、lapply() と sapply() を学んだときと同じデータセットを使います。

  |=======                                                                                  |   8%
| The Flags dataset from the UCI Machine Learning Repository contains details of various nations
| and their flags. More information may be found here:
| http://archive.ics.uci.edu/ml/datasets/Flags

...

Flags データセットは、UCI Machine Learning Repository(カリフォルニア大学アーバイン校の機械学習のためのデータ庫)からのもので、http://archive.ics.uci.edu/ml/datasets/Flags に詳しい情報があります。

  |===========                                                                              |  12%
| I've stored the data in a variable called flags. If it's been a while since you completed the
| 'lapply and sapply' lesson, you may want to reacquaint yourself with the data by using functions
| like dim(), head(), str(), and summary() when you return to the prompt (>). You can also type
| viewinfo() at the prompt to bring up some documentation for the dataset. Let's get started!

...

flag という名前の変数にデータを割り当ててあります。‘lapply’ と ‘sapply’ を学んでしばらくたっているのであれば、(いったん、swirl から抜けて)Prompt (>) で、dim(), head(), str(), や summary() を試してください。また、viewinfo() として、このデータセットに関する文書をみてみるのも良いでしょう。はじめましょう。

viewinfo() の情報は、flag.names.txt とというファイルにあり、その内容は、この課のノートの直前、前の課の最後に掲載してあります。また、この課の最後に、dim(), head(), str(), や summary() の結果の、Console Log を載せましたので、参考にしてください。

  |==============                                                                           |  16%
| As you saw in the last lesson, the unique() function returns a vector of the unique values
| contained in the object passed to it. Therefore, sapply(flags, unique) returns a list containing
| one vector of unique values for each column of the flags dataset. Try it again now.

前の課で見たように、unique() 関数は、そのオブジェクトに含まれる相異なる値をベクトルにして返します。そこで、sapply(flags, unique) は、flag データセットの各列の相異なる値をベクトルにして、リストにまとめて返します。試してください。

> sapply(flags, unique)
$name
  [1] Afghanistan              Albania                  Algeria                 
  [4] American-Samoa           Andorra                  Angola                  
  [7] Anguilla                 Antigua-Barbuda          Argentina               
 [10] Argentine                Australia                Austria                 
 [13] Bahamas                  Bahrain                  Bangladesh              
 [16] Barbados                 Belgium                  Belize                  
 [19] Benin                    Bermuda                  Bhutan                  
 [22] Bolivia                  Botswana                 Brazil                  
 [25] British-Virgin-Isles     Brunei                   Bulgaria                
 [28] Burkina                  Burma                    Burundi                 
 [31] Cameroon                 Canada                   Cape-Verde-Islands      
 [34] Cayman-Islands           Central-African-Republic Chad                    
 [37] Chile                    China                    Colombia                
 [40] Comorro-Islands          Congo                    Cook-Islands            
 [43] Costa-Rica               Cuba                     Cyprus                  
 [46] Czechoslovakia           Denmark                  Djibouti                
 [49] Dominica                 Dominican-Republic       Ecuador                 
 [52] Egypt                    El-Salvador              Equatorial-Guinea       
 [55] Ethiopia                 Faeroes                  Falklands-Malvinas      
 [58] Fiji                     Finland                  France                  
 [61] French-Guiana            French-Polynesia         Gabon                   
 [64] Gambia                   Germany-DDR              Germany-FRG             
 [67] Ghana                    Gibraltar                Greece                  
 [70] Greenland                Grenada                  Guam                    
 [73] Guatemala                Guinea                   Guinea-Bissau           
 [76] Guyana                   Haiti                    Honduras                
 [79] Hong-Kong                Hungary                  Iceland                 
 [82] India                    Indonesia                Iran                    
 [85] Iraq                     Ireland                  Israel                  
 [88] Italy                    Ivory-Coast              Jamaica                 
 [91] Japan                    Jordan                   Kampuchea               
 [94] Kenya                    Kiribati                 Kuwait                  
 [97] Laos                     Lebanon                  Lesotho                 
[100] Liberia                  Libya                    Liechtenstein           
[103] Luxembourg               Malagasy                 Malawi                  
[106] Malaysia                 Maldive-Islands          Mali                    
[109] Malta                    Marianas                 Mauritania              
[112] Mauritius                Mexico                   Micronesia              
[115] Monaco                   Mongolia                 Montserrat              
[118] Morocco                  Mozambique               Nauru                   
[121] Nepal                    Netherlands              Netherlands-Antilles    
[124] New-Zealand              Nicaragua                Niger                   
[127] Nigeria                  Niue                     North-Korea             
[130] North-Yemen              Norway                   Oman                    
[133] Pakistan                 Panama                   Papua-New-Guinea        
[136] Parguay                  Peru                     Philippines             
[139] Poland                   Portugal                 Puerto-Rico             
[142] Qatar                    Romania                  Rwanda                  
[145] San-Marino               Sao-Tome                 Saudi-Arabia            
[148] Senegal                  Seychelles               Sierra-Leone            
[151] Singapore                Soloman-Islands          Somalia                 
[154] South-Africa             South-Korea              South-Yemen             
[157] Spain                    Sri-Lanka                St-Helena               
[160] St-Kitts-Nevis           St-Lucia                 St-Vincent              
[163] Sudan                    Surinam                  Swaziland               
[166] Sweden                   Switzerland              Syria                   
[169] Taiwan                   Tanzania                 Thailand                
[172] Togo                     Tonga                    Trinidad-Tobago         
[175] Tunisia                  Turkey                   Turks-Cocos-Islands     
[178] Tuvalu                   UAE                      Uganda                  
[181] UK                       Uruguay                  US-Virgin-Isles         
[184] USA                      USSR                     Vanuatu                 
[187] Vatican-City             Venezuela                Vietnam                 
[190] Western-Samoa            Yugoslavia               Zaire                   
[193] Zambia                   Zimbabwe                
194 Levels: Afghanistan Albania Algeria American-Samoa Andorra Angola Anguilla ... Zimbabwe

$landmass
[1] 5 3 4 6 1 2

$zone
[1] 1 3 2 4

$area
  [1]   648    29  2388     0  1247  2777  7690    84    19     1   143    31    23   113    47
 [16]  1099   600  8512     6   111   274   678    28   474  9976     4   623  1284   757  9561
 [31]  1139     2   342    51   115     9   128    43    22    49   284  1001    21  1222    12
 [46]    18   337   547    91   268    10   108   249   239   132  2176   109   246    36   215
 [61]   112    93   103  3268  1904  1648   435    70   301   323    11   372    98   181   583
 [76]   236    30  1760     3   587   118   333  1240  1031  1973  1566   447   783   140    41
 [91]  1267   925   121   195   324   212   804    76   463   407  1285   300   313    92   237
[106]    26  2150   196    72   637  1221    99   288   505    66  2506    63    17   450   185
[121]   945   514    57     5   164   781   245   178  9363 22402    15   912   256   905   753
[136]   391

$population
 [1]   16    3   20    0    7   28   15    8   90   10    1    6  119    9   35    4   24    2   11
[20] 1008    5   47   31   54   17   61   14  684  157   39   57  118   13   77   12   56   18   84
[39]   48   36   22   29   38   49   45  231  274   60

$language
 [1] 10  6  8  1  2  4  3  5  7  9

$religion
[1] 2 6 1 0 5 3 4 7

$bars
[1] 0 2 3 1 5

$stripes
 [1]  3  0  2  1  5  9 11 14  4  6 13  7

$colours
[1] 5 3 2 8 6 4 7 1

$red
[1] 1 0

$green
[1] 1 0

$blue
[1] 0 1

$gold
[1] 1 0

$white
[1] 1 0

$black
[1] 1 0

$orange
[1] 0 1

$mainhue
[1] green  red    blue   gold   white  orange black  brown 
Levels: black blue brown gold green orange red white

$circles
[1] 0 1 4 2

$crosses
[1] 0 1 2

$saltires
[1] 0 1

$quarters
[1] 0 1 4

$sunstars
 [1]  1  0  6 22 14  3  4  5 15 10  7  2  9 50

$crescent
[1] 0 1

$triangle
[1] 0 1

$icon
[1] 1 0

$animate
[1] 0 1

$text
[1] 0 1

$topleft
[1] black  red    green  blue   white  orange gold  
Levels: black blue gold green orange red white

$botright
[1] green  red    white  black  blue   gold   orange brown 
Levels: black blue brown gold green orange red white


| Keep working like that and you'll get there!

その調子で良いですよ。

  |==================                                                                       |  20%
| What if you had forgotten how unique() works and mistakenly thought it returns the *number* of
| unique values contained in the object passed to it? Then you might have incorrectly expected
| sapply(flags, unique) to return a numeric vector, since each element of the list returned would
| contain a single number and sapply() could then simplify the result to a vector.

...

たとえば、unique() がどのように作用するか忘れていて、渡したオブジェクトに含まれる相異なるもののを返すと考えたとしましょう。すると、各値は、一つの数値ですから、sapply(flags, unique) は、簡単にして、ベクトルを返すと考えることになります。

  |=====================                                                                    |  24%
| When working interactively (at the prompt), this is not much of a problem, since you see the
| result immediately and will quickly recognize your mistake. However, when working
| non-interactively (e.g. writing your own functions), a misunderstanding may go undetected and
| cause incorrect results later on. Therefore, you may wish to be more careful and that's where
| vapply() is useful.

...

Prompt に対話型で関数を入力し、作業を進めるときは、すぐに答えを見て、間違いに気づくので、あまり問題になりません。しかし、対話型ではなく、プログラムを書いて、作業を進めるときには、間違いに気づかずに過ぎてしまうかもしれません。従って、もっと、注意をする必要があります。このようなときに、vapply() は有効です。

  |=========================                                                                |  28%
| Whereas sapply() tries to 'guess' the correct format of the result, vapply() allows you to
| specify it explicitly. If the result doesn't match the format you specify, vapply() will throw
| an error, causing the operation to stop. This can prevent significant problems in your code that
| might be caused by getting unexpected return values from sapply().

...

sapply() では、結果の形式を推測して実行しますが、vapply() は明示的に、実行します。形式が適合しなければ、vapply() はエラーメッセージを出し、作業は停止します。これによって、sapply() が予想にない値を返し、重大な問題を生じることを避けることができます。

  |============================                                                             |  32%
| Try vapply(flags, unique, numeric(1)), which says that you expect each element of the result to
| be a numeric vector of length 1. Since this is NOT actually the case, YOU WILL GET AN ERROR.
| Once you get the error, type ok() to continue to the next question.

vapply(flags, unique, numeric(1)) と入力しましょう。これは、結果が、長さが1の数値ベクトルを返す事を期待していることを意味します。これは、実際には、正しくないので、エラーをひきおこします。エラーになったら、ok() と入力すると、次の問題を続けることができます。

> vapply(flags, unique, numeric(1))
Error in vapply(flags, unique, numeric(1)) : values must be length 1,
 but FUN(X[[1]]) result is length 194
[日本語の場合のエラーメッセージ] vapply(flags, unique, numeric(1)) でエラー: 
   値の長さは 1 でなければなりません、 
 しかし、FUN(X[[1]]) の結果の長さが 194 です 

> ok()

| You are doing so well!

とてもうまくいっていますよ。

  |================================                                                         |  36%
| Recall from the previous lesson that sapply(flags, class) will return a character vector
| containing the class of each column in the dataset. Try that again now to see the result.

前の課で学んだように、sapply(flags, class) は、データセットの各列の種類(class)を表す文字列ベクトルを返すのでした。もう一度試してください。

> sapply(flags, class)
      name   landmass       zone       area population   language   religion       bars    stripes 
  "factor"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer" 
   colours        red      green       blue       gold      white      black     orange    mainhue 
 "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"   "factor" 
   circles    crosses   saltires   quarters   sunstars   crescent   triangle       icon    animate 
 "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer" 
      text    topleft   botright 
 "integer"   "factor"   "factor" 

| All that hard work is paying off!

努力は報われます。

  |====================================                                                     |  40%
| If we wish to be explicit about the format of the result we expect, we can use vapply(flags,
| class, character(1)). The 'character(1)' argument tells R that we expect the class function to
| return a character vector of length 1 when applied to EACH column of the flags dataset. Try it
| now.

出力としてなにを期待しているかを明示的にするには、vapply(flags, class, character(1)) とします。引数の ‘character(1)’ は、R に、flag データセットの各列に、class 関数を適用した結果として、長さ 1 の文字列ベクトルを返すことを期待していると伝えています。試してください。

> vapply(flags, class, character(1))
      name   landmass       zone       area population   language   religion       bars    stripes 
  "factor"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer" 
   colours        red      green       blue       gold      white      black     orange    mainhue 
 "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"   "factor" 
   circles    crosses   saltires   quarters   sunstars   crescent   triangle       icon    animate 
 "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer"  "integer" 
      text    topleft   botright 
 "integer"   "factor"   "factor" 

| You nailed it! Good job!

見事です。

  |=======================================                                                  |  44%
| Note that since our expectation was correct (i.e. character(1)), the vapply() result is
| identical to the sapply() result -- a character vector of column classes.

...

確かに、予測は正しく(character(1))、vapply() は、sapply() と全く同じ結果、すなわち、各列の種類(class)を返しました。

  |===========================================                                              |  48%
| You might think of vapply() as being 'safer' than sapply(), since it requires you to specify the
| format of the output in advance, instead of just allowing R to 'guess' what you wanted. In
| addition, vapply() may perform faster than sapply() for large datasets. However, when doing data
| analysis interactively (at the prompt), sapply() saves you some typing and will often be good
| enough.

...

vapply() は、sapply() のように、R に、結果を推測させるのではなく、具体的な形式を前もって指定するので、より安全だと思うでしょう。さらに、vapply() は sapply() と比較して、特に大きなデータセットに対しては、実行速度が速いのです。しかし、Prompt に対して、対話型で、作業を進めるときは、sapply() は、入力を減らしますし、多くの場合、それで十分でもあります。

  |==============================================                                           |  52%
| As a data analyst, you'll often wish to split your data up into groups based on the value of
| some variable, then apply a function to the members of each group. The next function we'll look
| at, tapply(), does exactly that.

...

データ分析をしようとすると、まず、データを、ある変数の値によって、いくつかに分割し、分割したグループのそれぞれに、関数を適用したいことが起こります。次の関数 tapply() は、これから見ていくように、まさに、その作業をします。

  |==================================================                                       |  56%
| Use ?tapply to pull up the documentation.

?tapply として、説明の文書を表示してください。

> ?tapply

| You are amazing!

驚きです。

  |=====================================================                                    |  60%
| The 'landmass' variable in our dataset takes on integer values between 1 and 6, each of which
| represents a different part of the world. Use table(flags$landmass) to see how many
| flags/countries fall into each group.

データセットの ‘landmass’ 変数は、1 から 6 の整数値をとり、世界の異なる地域を表します。table(flags$landmass) として、それぞれのグループにいくつの、flags/countries(国旗/国)があるか、見てみましょう。

> table(flags$landmass)

 1  2  3  4  5  6 
31 17 35 52 39 20 

| That's a job well done!

よくできました。

  |=========================================================                                |  64%
| The 'animate' variable in our dataset takes the value 1 if a country's flag contains an animate
| image (e.g. an eagle, a tree, a human hand) and 0 otherwise. Use table(flags$animate) to see how
| many flags contain an animate image.

データセットの ‘animate’ 変数は、その国の国旗が animate image(たとえば、わしや、木や、人間の手など)を含んでいれば 1、そうでなければ、0 となっています。table(flags$animate) として、いくつの国旗が、animate image を含んでいるか見てください。

> table(flags$animate)

  0   1 
155  39 

| That's a job well done!

よくできました。

  |=============================================================                            |  68%
| This tells us that 39 flags contain an animate object (animate = 1) and 155 do not (animate =
| 0).

...

39の旗は、animate object (animate = 1) を含んでおり、155 (animate = 0) は含んでいないことがわかりました。

  |================================================================                         |  72%
| If you take the arithmetic mean of a bunch of 0s and 1s, you get the proportion of 1s. Use
| tapply(flags$animate, flags$landmass, mean) to apply the mean function to the 'animate' variable
| separately for each of the six landmass groups, thus giving us the proportion of flags
| containing an animate image WITHIN each landmass group.

0 と 1 の値の算術平均をとれば、1 の割合がわかります。tapply(flags\(animate, flags\)landmass, mean) として、6個の、landmass グループそれぞれについて別々に、‘animate’ 変数の mean(平均)をとってください。

> tapply(flags$animate, flags$landmass, mean)
        1         2         3         4         5         6 
0.4193548 0.1764706 0.1142857 0.1346154 0.1538462 0.3000000 

| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |====================================================================                     |  76%
| The first landmass group (landmass = 1) corresponds to North America and contains the highest
| proportion of flags with an animate image (0.4194).

...

最初の landmass グループ (landmass = 1) は、北アメリカで、一番高い、animate image の割合 (0.4194) になっています。

  |=======================================================================                  |  80%
| Similarly, we can look at a summary of population values (in round millions) for countries with
| and without the color red on their flag with tapply(flags$population, flags$red, summary).

同様に、tapply(flags\(population, flags\)red, summary) で、旗に赤を含むか含まないかに分けて、人口を(100万人に四捨五入して)summary (要約)関数を使って求めてみましょう。

> tapply(flags$population, flags$red, summary)
$`0`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    0.00    3.00   27.63    9.00  684.00 

$`1`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0     0.0     4.0    22.1    15.0  1008.0 


| That's correct!

正しいです。

  |===========================================================================              |  84%
| What is the median population (in millions) for countries *without* the color red on their flag?

1: 4.0
2: 3.0
3: 0.0
4: 22.1
5: 27.6
6: 9.0

赤を国旗に含まない国の中での人口(100万人に四捨五入)のメディアン(中央値)は何ですか。次から選んでください。1: 4.0, 2: 3.0, 3: 0.0, 4:22.1, 5: 27.6, 6:9.0.

Selection: 2

| You got it!

その通りです。

  |==============================================================================           |  88%
| Lastly, use the same approach to look at a summary of population values for each of the six
| landmasses.

最後に、同じ方法で、それぞれの 6種類の landmasses グループごとに、人口についての要約(summary)を求めてください。

> tapply(flags$population, flags$landmass, summary)
$`1`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    0.00    0.00   12.29    4.50  231.00 

$`2`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    1.00    6.00   15.71   15.00  119.00 

$`3`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    0.00    8.00   13.86   16.00   61.00 

$`4`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   1.000   5.000   8.788   9.750  56.000 

$`5`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    2.00   10.00   69.18   39.00 1008.00 

$`6`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    0.00    0.00   11.30    1.25  157.00 


| That's the answer I was looking for.

そうです。それが、期待したものです。

  |==================================================================================       |  92%
| What is the maximum population (in millions) for the fourth landmass group (Africa)?

1: 1010.0
2: 119.0
3: 5.00
4: 56.00
5: 157.00

4番目の landmass グループ(アフリカ)の最大の人口(100万人に四捨五入)は何ですか。次から選んでください。1: 1010.0, 2: 119.0, 3: 5.00, 4: 56.00, 5: 157.00

Selection: 4

| All that hard work is paying off!

努力は報われます。

  |=====================================================================================    |  96%
| In this lesson, you learned how to use vapply() as a safer alternative to sapply(), which is
| most helpful when writing your own functions. You also learned how to use tapply() to split your
| data into groups based on the value of some variable, then apply a function to each group. These
| functions will come in handy on your quest to become a better data analyst.

...

この課では、vapply() は、sapply() の代わりとして使えるより安全な命令で、特に、プログラムを書くときに有効であることを学びました。また、tappy() 関数によって、ある変数の値によってグループに分けたデータについて、それぞれの、グループに関数を適用することに用いることを学びました。これらは、データ分析をするときに、とても、便利な関数です。

  |=========================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

Coursera.org の終了認定が欲しいですか。

1 を選んでください。何度か説明しましたから、省略します。

| Great job!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 0

やりましたね。

課の最後までたどり着きました。メニューに戻るには、

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: `swirl` コースのリストへ.

選択:0 (SL が入力)

| Leaving swirl now. Type swirl() to resume.

swirl を終了します。再開するときは、swirl() と入力してください。

>

dim(), head(), str(), and summary() of flags

Console Log

> dim(flags)
[1] 194  30

> head(flags)
            name landmass zone area population language religion bars stripes colours red green
1    Afghanistan        5    1  648         16       10        2    0       3       5   1     1
2        Albania        3    1   29          3        6        6    0       0       3   1     0
3        Algeria        4    1 2388         20        8        2    2       0       3   1     1
4 American-Samoa        6    3    0          0        1        1    0       0       5   1     0
5        Andorra        3    1    0          0        6        0    3       0       3   1     0
6         Angola        4    2 1247          7       10        5    0       2       3   1     0
  blue gold white black orange mainhue circles crosses saltires quarters sunstars crescent
1    0    1     1     1      0   green       0       0        0        0        1        0
2    0    1     0     1      0     red       0       0        0        0        1        0
3    0    0     1     0      0   green       0       0        0        0        1        1
4    1    1     1     0      1    blue       0       0        0        0        0        0
5    1    1     0     0      0    gold       0       0        0        0        0        0
6    0    1     0     1      0     red       0       0        0        0        1        0
  triangle icon animate text topleft botright
1        0    1       0    0   black    green
2        0    0       1    0     red      red
3        0    0       0    0   green    white
4        1    1       1    0    blue      red
5        0    0       0    0    blue      red
6        0    1       0    0     red    black

> str(flags)
'data.frame':   194 obs. of  30 variables:
 $ name      : Factor w/ 194 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ landmass  : int  5 3 4 6 3 4 1 1 2 2 ...
 $ zone      : int  1 1 1 3 1 2 4 4 3 3 ...
 $ area      : int  648 29 2388 0 0 1247 0 0 2777 2777 ...
 $ population: int  16 3 20 0 0 7 0 0 28 28 ...
 $ language  : int  10 6 8 1 6 10 1 1 2 2 ...
 $ religion  : int  2 6 2 1 0 5 1 1 0 0 ...
 $ bars      : int  0 0 2 0 3 0 0 0 0 0 ...
 $ stripes   : int  3 0 0 0 0 2 1 1 3 3 ...
 $ colours   : int  5 3 3 5 3 3 3 5 2 3 ...
 $ red       : int  1 1 1 1 1 1 0 1 0 0 ...
 $ green     : int  1 0 1 0 0 0 0 0 0 0 ...
 $ blue      : int  0 0 0 1 1 0 1 1 1 1 ...
 $ gold      : int  1 1 0 1 1 1 0 1 0 1 ...
 $ white     : int  1 0 1 1 0 0 1 1 1 1 ...
 $ black     : int  1 1 0 0 0 1 0 1 0 0 ...
 $ orange    : int  0 0 0 1 0 0 1 0 0 0 ...
 $ mainhue   : Factor w/ 8 levels "black","blue",..: 5 7 5 2 4 7 8 7 2 2 ...
 $ circles   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ crosses   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ saltires  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ quarters  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ sunstars  : int  1 1 1 0 0 1 0 1 0 1 ...
 $ crescent  : int  0 0 1 0 0 0 0 0 0 0 ...
 $ triangle  : int  0 0 0 1 0 0 0 1 0 0 ...
 $ icon      : int  1 0 0 1 0 1 0 0 0 0 ...
 $ animate   : int  0 1 0 1 0 0 1 0 0 0 ...
 $ text      : int  0 0 0 0 0 0 0 0 0 0 ...
 $ topleft   : Factor w/ 7 levels "black","blue",..: 1 6 4 2 2 6 7 1 2 2 ...
 $ botright  : Factor w/ 8 levels "black","blue",..: 5 7 8 7 7 1 2 7 2 2 ...

> summary(flags)
             name        landmass          zone            area           population     
 Afghanistan   :  1   Min.   :1.000   Min.   :1.000   Min.   :    0.0   Min.   :   0.00  
 Albania       :  1   1st Qu.:3.000   1st Qu.:1.000   1st Qu.:    9.0   1st Qu.:   0.00  
 Algeria       :  1   Median :4.000   Median :2.000   Median :  111.0   Median :   4.00  
 American-Samoa:  1   Mean   :3.572   Mean   :2.211   Mean   :  700.0   Mean   :  23.27  
 Andorra       :  1   3rd Qu.:5.000   3rd Qu.:4.000   3rd Qu.:  471.2   3rd Qu.:  14.00  
 Angola        :  1   Max.   :6.000   Max.   :4.000   Max.   :22402.0   Max.   :1008.00  
 (Other)       :188                                                                      
    language        religion          bars           stripes          colours     
 Min.   : 1.00   Min.   :0.000   Min.   :0.0000   Min.   : 0.000   Min.   :1.000  
 1st Qu.: 2.00   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.: 0.000   1st Qu.:3.000  
 Median : 6.00   Median :1.000   Median :0.0000   Median : 0.000   Median :3.000  
 Mean   : 5.34   Mean   :2.191   Mean   :0.4536   Mean   : 1.552   Mean   :3.464  
 3rd Qu.: 9.00   3rd Qu.:4.000   3rd Qu.:0.0000   3rd Qu.: 3.000   3rd Qu.:4.000  
 Max.   :10.00   Max.   :7.000   Max.   :5.0000   Max.   :14.000   Max.   :8.000  
                                                                                  
      red             green             blue             gold            white       
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:1.0000  
 Median :1.0000   Median :0.0000   Median :1.0000   Median :0.0000   Median :1.0000  
 Mean   :0.7887   Mean   :0.4691   Mean   :0.5103   Mean   :0.4691   Mean   :0.7526  
 3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
 Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
                                                                                     
     black           orange         mainhue      circles          crosses          saltires      
 Min.   :0.000   Min.   :0.000   red    :71   Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
 1st Qu.:0.000   1st Qu.:0.000   blue   :40   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000  
 Median :0.000   Median :0.000   green  :31   Median :0.0000   Median :0.0000   Median :0.00000  
 Mean   :0.268   Mean   :0.134   white  :22   Mean   :0.1701   Mean   :0.1495   Mean   :0.09278  
 3rd Qu.:1.000   3rd Qu.:0.000   gold   :19   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.00000  
 Max.   :1.000   Max.   :1.000   black  : 5   Max.   :4.0000   Max.   :2.0000   Max.   :1.00000  
                                 (Other): 6                                                      
    quarters         sunstars         crescent         triangle           icon       
 Min.   :0.0000   Min.   : 0.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.: 0.000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :0.0000   Median : 0.000   Median :0.0000   Median :0.0000   Median :0.0000  
 Mean   :0.1495   Mean   : 1.387   Mean   :0.0567   Mean   :0.1392   Mean   :0.2526  
 3rd Qu.:0.0000   3rd Qu.: 1.000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.7500  
 Max.   :4.0000   Max.   :50.000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
                                                                                     
    animate           text           topleft      botright 
 Min.   :0.000   Min.   :0.00000   black :12   red    :69  
 1st Qu.:0.000   1st Qu.:0.00000   blue  :43   blue   :47  
 Median :0.000   Median :0.00000   gold  : 6   green  :40  
 Mean   :0.201   Mean   :0.08247   green :32   white  :17  
 3rd Qu.:0.000   3rd Qu.:0.00000   orange: 4   black  : 9  
 Max.   :1.000   Max.   :1.00000   red   :56   gold   : 9  
                                   white :41   (Other): 3  

2.12. Looking at Data

実際のデータの分析にとりかかります。最初にすることは、データがどのようなものか、その構造(structure)を知ることです。すでに、今までに登場した関数もありますが、ここで、まとめて、学びます。
和訳や解説は、課を選択するところから始めます。よくわからない方は、1.2 または、1.1 の最初の部分を復習してください。

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files        3: Sequences of Numbers    
 4: Vectors                    5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                      9: Functions               
10: lapply and sapply         11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times           15: Base Graphics           


Selection: 12

12: Looking at Data(第12課 データの構造を見る)を選択しました。

  |                                                                                         |   0%

| Whenever you're working with a new dataset, the first thing you should do is look at it! What is
| the format of the data? What are the dimensions? What are the variable names? How are the
| variables stored? Are there missing data? Are there any flaws in the data?

...

新しいデータセットに取り組むために最初にすべきことは、それを見ることです。形式はなにか、次元は、変数の名前は、変数はどのように保存されているか、欠損値はあるか、データに欠陥はないか、などです。

  |====                                                                                     |   4%
| This lesson will teach you how to answer these questions and more using R's built-in functions.
| We'll be using a dataset constructed from the United States Department of Agriculture's PLANTS
| Database (http://plants.usda.gov/adv_search.html).

...

この課では、これらの問いに、Rに付属した関数で答えます。使用するのは、アメリカ合衆国の農業省の植物データベース(The United States Department of Agriculture’s PLANTS Database (http://plants.usda.gov/adv_search.html))です。

  |=======                                                                                  |   8%
| I've stored the data for you in a variable called plants. Type ls() to list the variables in
| your workspace, among which should be plants.

plants という変数で、データを保存してあります。ls() と入力して、あなたの作業領域にある変数を見てみましょう。そのなかに、plants に関するものがあります。

> ls()
 [1] "%p%"             "boring_function" "cls_list"        "cls_vect"        "cnames"         
 [6] "evaluate"        "flag_colors"     "flag_shapes"     "flags"           "ints"           
[11] "mad_libs"        "my_char"         "my_data"         "my_matrix"       "my_matrix2"     
[16] "my_mean"         "my_na"           "my_name"         "my_norm"         "my_seq"         
[21] "my_vector"       "num_vect"        "ok"              "old.dir"         "patients"       
[26] "plants"          "remainder"       "shape_mat"       "telegram"        "tf"             
[31] "unique_vals"     "vect"            "vect2"           "viewinfo"        "x"              
[36] "y"               "z"              

| Excellent job!

素晴らしい。

swirl で使ったものが、いろいろと含まれていますね。“plants” もあります。これらを消去することもできました。

  |===========                                                                              |  12%
| Let's begin by checking the class of the plants variable with class(plants). This will give us a
| clue as to the overall structure of the data.

class(plants) で、plants という名前の変数の種類(class)を調べましょう。これが、データの構造を見る、最初のきっかけです。

> class(plants)
[1] "data.frame"

| All that hard work is paying off!

努力は報われます。

  |==============                                                                           |  16%
| It's very common for data to be stored in a data frame. It is the default class for data read
| into R using functions like read.csv() and read.table(), which you'll learn about in another
| lesson.

...

データがデータフレーム形式で保存されているのは、とても標準的なことで、他の課で学ぶことになると思いますが、read.csv() や、read.table() などの R の関数でデータを読み込むときの、最も基本的な形式です。

  |==================                                                                       |  20%
| Since the dataset is stored in a data frame, we know it is rectangular. In other words, it has
| two dimensions (rows and columns) and fits neatly into a table or spreadsheet. Use dim(plants)
| to see exactly how many rows and columns we're dealing with.

データセットは、データフレーム形式ですから、長方形をしています。すなわち、2次元データで(行と列をもち)表とかスプレッドシート一枚に収まる形式になっています。dim(plants) を使って、行がいくつあり、列があるか、確認してください。

> dim(plants)
[1] 5166   10

| That's a job well done!

よくできました。

  |=====================                                                                    |  24%
| The first number you see (5166) is the number of rows (observations) and the second number (10)
| is the number of columns (variables).

...

最初の数、5166 は、行(オブザベーション(観察))の数で、二番目の数、10 は列(変数)の数です。

  |=========================                                                                |  28%
| You can also use nrow(plants) to see only the number of rows. Try it out.

nrow(plants) で行数だけを表示させることもdけいます。試してください。

> nrow(plants)
[1] 5166

| Nice work!

それで良いですよ。

  |============================                                                             |  32%
| ... And ncol(plants) to see only the number of columns.

同様に、ncol(plants) で、列数だけを表示させることもできます。

> ncol(plants)
[1] 10

| That's the answer I was looking for.

そうです。それが、期待したものです。

  |================================                                                         |  36%
| If you are curious as to how much space the dataset is occupying in memory, you can use
| object.size(plants).

データセットが消費しているメモリー量を知りたいときには、object.size(plants) とします。

> object.size(plants)
686080 bytes

| All that practice is paying off!

練習のおかげですね。

  |====================================                                                     |  40%
| Now that we have a sense of the shape and size of the dataset, let's get a feel for what's
| inside. names(plants) will return a character vector of column (i.e. variable) names. Give it a
| shot.

これで、データの規模(サイズ)が大体わかりました。names(plants) は、列(すなわち変数)名を返します。やってみてください。

> names(plants)
 [1] "Scientific_Name"      "Duration"             "Active_Growth_Period" "Foliage_Color"       
 [5] "pH_Min"               "pH_Max"               "Precip_Min"           "Precip_Max"          
 [9] "Shade_Tolerance"      "Temp_Min_F"          

| Excellent job!

素晴らしい。

  |=======================================                                                  |  44%
| We've applied fairly descriptive variable names to this dataset, but that won't always be the
| case. A logical next step is to peek at the actual data. However, our dataset contains over 5000
| observations (rows), so it's impractical to view the whole thing all at once.

...

かなり詳細な変数名がわかりましたが、常にそうというわけではありません。次にすべきことは、データをのぞいてみることです。しかし、データセットには、5000以上もの観察(行)が含まれています。そこで、すべてを一度に見ることは実際的ではありません。

  |===========================================                                              |  48%
| The head() function allows you to preview the top of the dataset. Give it a try with only one
| argument.

head() 関数によって、データセットの頭(top)を見ることができます。一つだけ引数を入れて、試してください。

> head(plants)
               Scientific_Name          Duration Active_Growth_Period Foliage_Color pH_Min pH_Max
1                  Abelmoschus              <NA>                 <NA>          <NA>     NA     NA
2       Abelmoschus esculentus Annual, Perennial                 <NA>          <NA>     NA     NA
3                        Abies              <NA>                 <NA>          <NA>     NA     NA
4               Abies balsamea         Perennial    Spring and Summer         Green      4      6
5 Abies balsamea var. balsamea         Perennial                 <NA>          <NA>     NA     NA
6                     Abutilon              <NA>                 <NA>          <NA>     NA     NA
  Precip_Min Precip_Max Shade_Tolerance Temp_Min_F
1         NA         NA            <NA>         NA
2         NA         NA            <NA>         NA
3         NA         NA            <NA>         NA
4         13         60        Tolerant        -43
5         NA         NA            <NA>         NA
6         NA         NA            <NA>         NA

| Excellent work!

素晴らしい。

  |==============================================                                           |  52%
| Take a minute to look through and understand the output above. Each row is labeled with the
| observation number and each column with the variable name. Your screen is probably not wide
| enough to view all 10 columns side-by-side, in which case R displays as many columns as it can
| on each line before continuing on the next.

...

ちょっと時間ととって、上の出力をながめて見てください。各行は、観察番号(observation number)が付いていて、各列には、変数名が付いています。画面は幅が足りず、10列を端から端までみることはできないかもしれません。R は可能な限りの列を表示し、次の列に続きを表示します。

  |==================================================                                       |  56%
| By default, head() shows you the first six rows of the data. You can alter this behavior by
| passing as a second argument the number of rows you'd like to view. Use head() to preview the
| first 10 rows of plants.

head() の初期値は、データの最初の6行を表示します。このことは、二番目の引数として、何行目までみたいかを関数に渡すことで、変更することができます。head() を使って、plants の最初の10行を表示してください。

> head(plants, 10)
                     Scientific_Name          Duration Active_Growth_Period Foliage_Color pH_Min
1                        Abelmoschus              <NA>                 <NA>          <NA>     NA
2             Abelmoschus esculentus Annual, Perennial                 <NA>          <NA>     NA
3                              Abies              <NA>                 <NA>          <NA>     NA
4                     Abies balsamea         Perennial    Spring and Summer         Green      4
5       Abies balsamea var. balsamea         Perennial                 <NA>          <NA>     NA
6                           Abutilon              <NA>                 <NA>          <NA>     NA
7               Abutilon theophrasti            Annual                 <NA>          <NA>     NA
8                             Acacia              <NA>                 <NA>          <NA>     NA
9                  Acacia constricta         Perennial    Spring and Summer         Green      7
10 Acacia constricta var. constricta         Perennial                 <NA>          <NA>     NA
   pH_Max Precip_Min Precip_Max Shade_Tolerance Temp_Min_F
1      NA         NA         NA            <NA>         NA
2      NA         NA         NA            <NA>         NA
3      NA         NA         NA            <NA>         NA
4     6.0         13         60        Tolerant        -43
5      NA         NA         NA            <NA>         NA
6      NA         NA         NA            <NA>         NA
7      NA         NA         NA            <NA>         NA
8      NA         NA         NA            <NA>         NA
9     8.5          4         20      Intolerant        -13
10     NA         NA         NA            <NA>         NA

| That's the answer I was looking for.

そうです。それが、期待したものです。

  |=====================================================                                    |  60%
| The same applies for using tail() to preview the end of the dataset. Use tail() to view the last
| 15 rows.

同様に、tail() を使って、データセットの最後を見ることもできます。tail() を使って、最後の15行を見てください。

> tail(plants, 15)
                      Scientific_Name  Duration Active_Growth_Period Foliage_Color pH_Min pH_Max
5152                          Zizania      <NA>                 <NA>          <NA>     NA     NA
5153                 Zizania aquatica    Annual               Spring         Green    6.4    7.4
5154   Zizania aquatica var. aquatica    Annual                 <NA>          <NA>     NA     NA
5155                Zizania palustris    Annual                 <NA>          <NA>     NA     NA
5156 Zizania palustris var. palustris    Annual                 <NA>          <NA>     NA     NA
5157                      Zizaniopsis      <NA>                 <NA>          <NA>     NA     NA
5158             Zizaniopsis miliacea Perennial    Spring and Summer         Green    4.3    9.0
5159                            Zizia      <NA>                 <NA>          <NA>     NA     NA
5160                     Zizia aptera Perennial                 <NA>          <NA>     NA     NA
5161                      Zizia aurea Perennial                 <NA>          <NA>     NA     NA
5162                 Zizia trifoliata Perennial                 <NA>          <NA>     NA     NA
5163                          Zostera      <NA>                 <NA>          <NA>     NA     NA
5164                   Zostera marina Perennial                 <NA>          <NA>     NA     NA
5165                           Zoysia      <NA>                 <NA>          <NA>     NA     NA
5166                  Zoysia japonica Perennial                 <NA>          <NA>     NA     NA
     Precip_Min Precip_Max Shade_Tolerance Temp_Min_F
5152         NA         NA            <NA>         NA
5153         30         50      Intolerant         32
5154         NA         NA            <NA>         NA
5155         NA         NA            <NA>         NA
5156         NA         NA            <NA>         NA
5157         NA         NA            <NA>         NA
5158         35         70      Intolerant         12
5159         NA         NA            <NA>         NA
5160         NA         NA            <NA>         NA
5161         NA         NA            <NA>         NA
5162         NA         NA            <NA>         NA
5163         NA         NA            <NA>         NA
5164         NA         NA            <NA>         NA
5165         NA         NA            <NA>         NA
5166         NA         NA            <NA>         NA

| Nice work!

それで良いですよ。

  |=========================================================                                |  64%
| After previewing the top and bottom of the data, you probably noticed lots of NAs, which are R's
| placeholders for missing values. Use summary(plants) to get a better feel for how each variable
| is distributed and how much of the dataset is missing.

データの頭(top)とおしり(bottom)を見て、たくさん NA が含まれていることがわかったのではないでしょうか。これは、欠損値の場所を示すRの記号(R’s placeholders for missing values)です。summary(plants) を使って、変数はどのように分布しており、そして、データセットは、どのくらい欠損値があるか見てみましょう。

> summary(plants)
                     Scientific_Name              Duration              Active_Growth_Period
 Abelmoschus                 :   1   Perennial        :3031   Spring and Summer   : 447     
 Abelmoschus esculentus      :   1   Annual           : 682   Spring              : 144     
 Abies                       :   1   Annual, Perennial: 179   Spring, Summer, Fall:  95     
 Abies balsamea              :   1   Annual, Biennial :  95   Summer              :  92     
 Abies balsamea var. balsamea:   1   Biennial         :  57   Summer and Fall     :  24     
 Abutilon                    :   1   (Other)          :  92   (Other)             :  30     
 (Other)                     :5160   NA's             :1030   NA's                :4334     
      Foliage_Color      pH_Min          pH_Max         Precip_Min      Precip_Max    
 Dark Green  :  82   Min.   :3.000   Min.   : 5.100   Min.   : 4.00   Min.   : 16.00  
 Gray-Green  :  25   1st Qu.:4.500   1st Qu.: 7.000   1st Qu.:16.75   1st Qu.: 55.00  
 Green       : 692   Median :5.000   Median : 7.300   Median :28.00   Median : 60.00  
 Red         :   4   Mean   :4.997   Mean   : 7.344   Mean   :25.57   Mean   : 58.73  
 White-Gray  :   9   3rd Qu.:5.500   3rd Qu.: 7.800   3rd Qu.:32.00   3rd Qu.: 60.00  
 Yellow-Green:  20   Max.   :7.000   Max.   :10.000   Max.   :60.00   Max.   :200.00  
 NA's        :4334   NA's   :4327    NA's   :4327     NA's   :4338    NA's   :4338    
     Shade_Tolerance   Temp_Min_F    
 Intermediate: 242   Min.   :-79.00  
 Intolerant  : 349   1st Qu.:-38.00  
 Tolerant    : 246   Median :-33.00  
 NA's        :4329   Mean   :-22.53  
                     3rd Qu.:-18.00  
                     Max.   : 52.00  
                     NA's   :4328    

| You got it right!

その通りです。

  |=============================================================                            |  68%
| summary() provides different output for each variable, depending on its class. For numeric data
| such as Precip_Min, summary() displays the minimum, 1st quartile, median, mean, 3rd quartile,
| and maximum. These values help us understand how the data are distributed.

...

summary() は、各変数の種類(class)ごとに、ことなる情報を提供しています。Precip_Min のような、数データの場合は、summary() は、最小値、第一四分位数、中央値、平均値、第三四分位数と、最大値を表示します。これらの値は、データがどのように分布しているかを理解する助けになります。

  |================================================================                         |  72%
| For categorical variables (called 'factor' variables in R), summary() displays the number of
| times each value (or 'level') occurs in the data. For example, each value of Scientific_Name
| only appears once, since it is unique to a specific plant. In contrast, the summary for Duration
| (also a factor variable) tells us that our dataset contains 3031 Perennial plants, 682 Annual
| plants, etc.

...

種別(categorical)変数(R ではファクター(factor)と呼びます)の場合は、summary() は、それぞれの値(これをレベル(level)と言います)がデータの中で何回現れるかを表示します。たとえば、Scientific_Name は、それぞれ1回のみです。これは、それぞれの植物に固有のものだからです。対照的に、Duration(どれくらい生きるか:これもファクター変数です)は、3031の perennial plants(多年草)と、682 の annual plants(一年草)、などとなっています。

  |====================================================================                     |  76%
| You can see that R truncated the summary for Active_Growth_Period by including a catch-all
| category called 'Other'. Since it is a categorical/factor variable, we can see how many times
| each value actually occurs in the data with table(plants$Active_Growth_Period).

また、Active_Growth_Period の summary においては、最後をはしょって、‘Other’(その他)としてまとめています。実際に、データに何回出現するかは、table(plants$Active_Growth_Period) によって知ることができます。

> table(plants$Active_Growth_Period)

Fall, Winter and Spring                  Spring         Spring and Fall       Spring and Summer 
                     15                     144                      10                     447 
   Spring, Summer, Fall                  Summer         Summer and Fall              Year Round 
                     95                      92                      24                       5 

| Your dedication is inspiring!

感動ものです。

  |=======================================================================                  |  80%
| Each of the functions we've introduced so far has its place in helping you to better understand
| the structure of your data. However, we've left the best for last....

...

いままで紹介した一つ一つの関数は、データの構造を知る上で、それぞれの場で助けになります。しかし、もう一つ、最高のものが残っています。

  |===========================================================================              |  84%
| Perhaps the most useful and concise function for understanding the *str*ucture of your data is
| str(). Give it a try now.

おそらく、データの構造を理解する上で、最も有効で、かつ、簡潔に答えてくれる関数は、str() です。試してください。

> str(plants)
'data.frame':   5166 obs. of  10 variables:
 $ Scientific_Name     : Factor w/ 5166 levels "Abelmoschus",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Duration            : Factor w/ 8 levels "Annual","Annual, Biennial",..: NA 4 NA 7 7 NA 1 NA 7 7 ...
 $ Active_Growth_Period: Factor w/ 8 levels "Fall, Winter and Spring",..: NA NA NA 4 NA NA NA NA 4 NA ...
 $ Foliage_Color       : Factor w/ 6 levels "Dark Green","Gray-Green",..: NA NA NA 3 NA NA NA NA 3 NA ...
 $ pH_Min              : num  NA NA NA 4 NA NA NA NA 7 NA ...
 $ pH_Max              : num  NA NA NA 6 NA NA NA NA 8.5 NA ...
 $ Precip_Min          : int  NA NA NA 13 NA NA NA NA 4 NA ...
 $ Precip_Max          : int  NA NA NA 60 NA NA NA NA 20 NA ...
 $ Shade_Tolerance     : Factor w/ 3 levels "Intermediate",..: NA NA NA 3 NA NA NA NA 2 NA ...
 $ Temp_Min_F          : int  NA NA NA -43 NA NA NA NA -13 NA ...

| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |==============================================================================           |  88%
| The beauty of str() is that it combines many of the features of the other functions you've
| already seen, all in a concise and readable format. At the very top, it tells us that the class
| of plants is 'data.frame' and that it has 5166 observations and 10 variables. It then gives us
| the name and class of each variable, as well as a preview of its contents.

...

str() は、今までに見てきた他の関数のよいところを組み合わせて、簡潔にしかし、読みやすい形式で示してくれます。一番上には、このデータの種類(class)は、‘data.frame’(データフレーム)であること。そして、5166 の observations(観察)と、10 の変数があることを知らせてくれます。次に、それぞれの変数の種類(class)と、その中身がどんな具合か表示してくれます。

  |==================================================================================       |  92%
| str() is actually a very general function that you can use on most objects in R. Any time you
| want to understand the structure of something (a dataset, function, etc.), str() is a good place
| to start.

...

str() は、ほとんどすべての R のオブジェクトに使うことができる、とても汎用性の高い関数です。なにか(データセット、関数、など)の構造を知りたいときには、ここからはじめるのがお薦めです。

  |=====================================================================================    |  96%
| In this lesson, you learned how to get a feel for the structure and contents of a new dataset
| using a collection of simple and useful functions. Taking the time to do this upfront can save
| you time and frustration later on in your analysis.

...

この課では、新しいデータセットが与えられたとき、構造と中身についてのおおよそについて知るための簡単なしかし便利な関数について学びました。データ分析をはじめる最初にこのことをしておくことで、あとになって、どうしようもなくなってしまうことがないように準備ができます。

  |=========================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

Coursera.org の終了認定が欲しいですか。

1 を選んでください。何度か説明しましたから、省略します。

| You got it!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 0

その通りです。

課の最後までたどり着きました。メニューに戻るには、

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: `swirl` コースのリストへ.

選択:0 (SL が入力)

| Leaving swirl now. Type swirl() to resume.

swirl を終了します。再開するときは、swirl() と入力してください。

>

2.13. Simulation

この課では、シミュレーション(模擬実験のことですが、この課では、シミュレーションを用語として使います)の方法を学びます。確率分布関数も出てきますが、それらについて、知らなくても、この課を学ぶことはできます。サイコロを何回か振ってみるとか、コインをなげて、表裏をみるとかです。二項分布、正規分布、ポアッソン分布関数が登場します。
和訳や解説は、課を選択するところから始めます。よくわからない方は、1.2 または、1.1 の最初の部分を復習してください。

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files        3: Sequences of Numbers    
 4: Vectors                    5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                      9: Functions               
10: lapply and sapply         11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times           15: Base Graphics           


Selection: 13

13: Simulation(第13課 シミュレーション)を選択しました。

  |                                                                                         |   0%

| One of the great advantages of using a statistical programming language like R is its vast
| collection of tools for simulating random numbers.

...

R のような統計用プログラム言語を使う利点は、乱数(random numbers)をシミュレーション(模擬的に作り出すこと)ができる、様々な方法が備わっていることです。

  |===                                                                                      |   3%
| This lesson assumes familiarity with a few common probability distributions, but these topics
| will only be discussed with respect to random number generation. Even if you have no prior
| experience with these concepts, you should be able to complete the lesson and understand the
| main ideas.

...

この課では、いくつかの標準的な確率分布について既知であることを仮定していますが、ここでは、無作為に乱数を生成することについてのみ学びます。ですから、これらの概念について学んだことがなくても、この課を終了し、考え方を学ぶことができます。

  |=====                                                                                    |   6%
| The first function we'll use to generate random numbers is sample(). Use ?sample to pull up the
| documentation.

最初に使う乱数の生成関数は、sample() です。?sample として、Help を見てください。

> ?sample

| Excellent work!

素晴らしい。

  |========                                                                                 |   9%
| Let's simulate rolling four six-sided dice: sample(1:6, 4, replace = TRUE).

sample(1:6, 4, replace = TRUE) で6つの面をもつサイコロを転がすシミュレーション(模擬実験)をしてみましょう。

sample(1:6, 4, replace = TRUE)
## [1] 4 4 6 3
| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |===========                                                                              |  12%
| Now repeat the command to see how your result differs. (The probability of rolling the exact
| same result is (1/6)^4 = 0.00077, which is pretty small!)

今行ったのと同じ命令を実行して、異なった結果がでることを見てみましょう。(全く同じ結果が出る確率は、(1/6)^4 = 0.00077 で非常に小さなものです。)

sample(1:6, 4, replace = TRUE)
## [1] 5 3 1 1
| You are doing so well!

とてもうまくいっていますよ。

  |=============                                                                            |  15%
| sample(1:6, 4, replace = TRUE) instructs R to randomly select four numbers between 1 and 6, WITH
| replacement. Sampling with replacement simply means that each number is "replaced" after it is
| selected, so that the same number can show up more than once. This is what we want here, since
| what you roll on one die shouldn't affect what you roll on any of the others.

...

sample(1:6, 4, replace = TRUE) は、1から6の数を 4個、重複をゆるして、無作為に選び出します。重複をゆるして無作為に取り出す(sampling with replaement)ということは、数を取り替えてといういみで、同じ数が、何回も現れる可能性があると言うことです。サイコロをころがすということは、ほかのサイコロの目に影響されませんから。それが、ここで実行したかったことです。

  |================                                                                         |  18%
| Now sample 10 numbers between 1 and 20, WITHOUT replacement. To sample without replacement,
| simply leave off the 'replace' argument.

こんどは、1から20までの数の中から、重複をゆるさずに(WITHOUT replacement)10個無作為に抽出してください。単に、引数 replace の部分を、省略すれば大丈夫です。

sample(1:20, 10)
##  [1] 19  8  1  7  3 12 16  6 13 15
| You got it right!

その通りです。

  |===================                                                                      |  21%
| Since the last command sampled without replacement, no number appears more than once in the
| output.

...

今回は、重複をゆるさず(without replacement)としたので、同じ数は現れませんでした。

  |======================                                                                   |  24%
| LETTERS is a predefined variable in R containing a vector of all 26 letters of the English
| alphabet. Take a look at it now.

LETTERS は、R に付属したベクトルで、26個の英語のアルファベットからなっています。見てみましょう。

LETTERS
##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
## [20] "T" "U" "V" "W" "X" "Y" "Z"
| You are really on a roll!

どんどん学んでいますね。

  |========================                                                                 |  27%
| The sample() function can also be used to permute, or rearrange, the elements of a vector. For
| example, try sample(LETTERS) to permute all 26 letters of the English alphabet.

sample() 関数は、単に、ベクトルの成分の順序を変える並び替えにも使えます。例として、sample(LETTERS) として、英語の26個のアルファベットの並び替えを試してください。

sample(LETTERS)
##  [1] "K" "V" "B" "O" "I" "N" "D" "M" "H" "Y" "E" "C" "J" "L" "P" "R" "W" "X" "F"
## [20] "G" "A" "T" "S" "Q" "U" "Z"
| You are amazing!

驚きです。

  |===========================                                                              |  30%
| This is identical to taking a sample of size 26 from LETTERS, without replacement. When the
| 'size' argument to sample() is not specified, R takes a sample equal in size to the vector from
| which you are sampling.

...

これは、26個の文字から、26個重複を許さずとったものと同じです。sample() 関数の引数 ‘size’ を省略すると、そのベクトルの長さと同じ個数取り出すことになります。

  |==============================                                                           |  33%
| Now, suppose we want to simulate 100 flips of an unfair two-sided coin. This particular coin has
| a 0.3 probability of landing 'tails' and a 0.7 probability of landing 'heads'.

...

今度は、表と裏のあるいびつなコインを100回投げることをシミュレーション(模擬的実験)をします。このコインは、0.3 の確率で、裏(‘tail’)が、0.7 の確率で、表(‘head’)が出るとします。

  |================================                                                         |  36%
| Let the value 0 represent tails and the value 1 represent heads. Use sample() to draw a sample
| of size 100 from the vector c(0,1), with replacement. Since the coin is unfair, we must attach
| specific probabilities to the values 0 (tails) and 1 (heads) with a fourth argument, prob =
| c(0.3, 0.7). Assign the result to a new variable called flips.

0 は裏を、1 は表を表すものとします。sample() 関数を使って、ベクトル c(0,1) から、100 個のサンプル(試行実験の結果)を得てください。このコインは、いびつですから、裏 0 が出る確率と、表 1 が出る確率を、四番目の引数として、prob = c(0.3, 0.7) として加えます。結果を、flips という名前の新たな変数として置いてください。

flips <- sample(c(0,1), 100, replace = TRUE, prob = c(0.3, 0.7))
| All that practice is paying off!

練習のおかげですね。

  |===================================                                                      |  39%
| View the contents of the flips variable.

flips 変数の中身を見てみましょう。

flips
##   [1] 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 0
##  [38] 0 1 0 1 1 1 0 1 0 1 1 1 1 1 0 0 0 0 0 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 0 1 0
##  [75] 1 0 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0 1 0 0 0 1 1 1 1 1
| You nailed it! Good job!

見事です。

  |======================================                                                   |  42%
| Since we set the probability of landing heads on any given flip to be 0.7, we'd expect
| approximately 70 of our coin flips to have the value 1. Count the actual number of 1s contained
| in flips using the sum() function.

表が出る確率を 0.7 にしましたから、だいたい70個のコインは、表(値1)になっているはずです。実際どのぐらいになっているか、sum() 関数で見てみましょう。

sum(flips)
## [1] 69
| You got it!

その通りです。

  |========================================                                                 |  45%
| A coin flip is a binary outcome (0 or 1) and we are performing 100 independent trials (coin
| flips), so we can use rbinom() to simulate a binomial random variable. Pull up the documentation
| for rbinom() using ?rbinom.

コイン投げは、二値(0 か 1)で、100 回の独立試行を行いました。これは、二項確率変数 (binomial random variable) でシミュレートすることができます。rbinom() の Help を ?rbinom で表示してください。

> ?rbinom

| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |===========================================                                              |  48%
| Each probability distribution in R has an r*** function (for "random"), a d*** function (for
| "density"), a p*** (for "probability"), and q*** (for "quantile"). We are most interested in the
| r*** functions in this lesson, but I encourage you to explore the others on your own.

...

R の確率分布関数は、r(r からはじまる)という名前になっており、r は、random(乱数)からとったものです、他に、d(density 密度)、p(probability 確率)、q(quantile 分位)があります。ここでは、r*** 関数について学びます。他の関数についても、調べてみることをお薦めします。

  |==============================================                                           |  52%
| A binomial random variable represents the number of 'successes' (heads) in a given number of
| independent 'trials' (coin flips). Therefore, we can generate a single random variable that
| represents the number of heads in 100 flips of our unfair coin using rbinom(1, size = 100, prob
| = 0.7). Note that you only specify the probability of 'success' (heads) and NOT the probability
| of 'failure' (tails). Try it now.

二項確率変数は、‘successes’ (heads) が、与えられた数の、独立試行(independent ‘trials’)で得られるかを表します。従って、このいびつなコイン投げの100回の試行で何回表がでるかを、rbinom(1, size = 100, prob = 0.7) で与えてくれます。これは、‘success’ (heads) の回数であって、‘failure’ (tails) の回数ではありませんから、注意して下さい。

rbinom(1, size = 100, prob = 0.7)
## [1] 68
| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |=================================================                                        |  55%
| Equivalently, if we want to see all of the 0s and 1s, we can request 100 observations, each of
| size 1, with success probability of 0.7. Give it a try, assigning the result to a new variable
| called flips2.

もし、1が出る確率を 0.7 として、100回試行して、0 と 1 が、どのように現れたかを見てみましょう。その結果を、flips2 として保存してください。

flips2 <- rbinom(100, size = 1, prob = 0.7)
| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |===================================================                                      |  58%
| View the contents of flips2.

flips2 の内容を見てみましょう。

flips2
##   [1] 1 1 0 1 1 1 0 1 1 0 1 0 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1
##  [38] 1 0 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1
##  [75] 1 1 1 1 0 0 0 0 1 0 0 1 1 1 1 0 1 0 1 1 0 0 0 0 1 1
| That's correct!

正しいです。

|======================================================                                   |  61%
| Now use sum() to count the number of 1s (heads) in flips2. It should be close to 70!

今度は、sum() 関数を使って、flips2 の中の 1 の数を数えてください。70に近い数のはずです。

sum(flips2)
## [1] 69
| That's a job well done!

よくできました。

  |=========================================================                                |  64%
| Similar to rbinom(), we can use R to simulate random numbers from many other probability
| distributions. Pull up the documentation for rnorm() now.

同様に、rbinom() と似た R の分布関数を使って、シミュレーションをするいくつもの方法があります。その例として、rnorm() の Help を見てみましょう。

> ?rnorm

| Great job!

やりましたね。

  |===========================================================                              |  67%
| The standard normal distribution has mean 0 and standard deviation 1. As you can see under the
| 'Usage' section in the documentation, the default values for the 'mean' and 'sd' arguments to
| rnorm() are 0 and 1, respectively. Thus, rnorm(10) will generate 10 random numbers from a
| standard normal distribution. Give it a try.

標準正規分布は、平均(mean)0、標準偏差(sdandard deviation)です。Help の ‘Usage’(使い方)にあるように、rnorm() の ‘mean’ と ‘sd’ の初期値は、0 と 1 になっています。従って、rnomr(10) は、標準正規分布をする、10個の乱数を生成します。

rnorm(10)
##  [1] -0.5136386  1.0615830  0.8032615 -1.3926282  0.5530357 -1.2148542
##  [7]  1.7256428  1.6045784  0.7498811 -1.0670571
| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |==============================================================                           |  70%
| Now do the same, except with a mean of 100 and a standard deviation of 25.

同様のことを、平均(mean)が 100、標準偏差(standard deviation)が 25 の正規分布として実行してください。

rnorm(10, mean = 100, sd = 25)
##  [1]  64.76708  90.09005 119.67386 132.86960 111.54579  80.05033 122.07953
##  [8]  72.45353  79.11323  98.26489
| You got it!

その通りです。

  |=================================================================                        |  73%
| Finally, what if we want to simulate 100 *groups* of random numbers, each containing 5 values
| generated from a Poisson distribution with mean 10? Let's start with one group of 5 numbers,
| then I'll show you how to repeat the operation 100 times in a convenient and compact way.

...

最後に、シミュレーションによって、ぞれぞれ5個の値からなる、100 グループの乱数を、ポアッソン分布を使って、生成してみましょう。まずは、5個の数からなる、一組を生成することから、はじめましょう。そのあとで、これを、100回繰り返す方法を学びます。

  |===================================================================                      |  76%
| Generate 5 random values from a Poisson distribution with mean 10. Check out the documentation
| for rpois() if you need help.

平均 (mean) が10の、ポアッソン分布をする数を、5個生成してください。rpois() の Help を参照してください。

rpois(5, 10)
## [1]  9  8  6  7 10
| Excellent work!

素晴らしい。

  |======================================================================                   |  79%
| Now use replicate(100, rpois(5, 10)) to perform this operation 100 times. Store the result in a
| new variable called my_pois.

今度は、replicate(100, rpois(5, 10)) と、100 回実行し、my_pois という名前の新しい変数に保存してください。

my_pois <- replicate(100, rpois(5, 10))
| You got it right!

その通りです。

  |=========================================================================                |  82%
| Take a look at the contents of my_pois.

my_pois の内容を見てみましょう。

my_pois
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]   18    7   19    8    8    8    9   15   10     9    11    13     3    10
## [2,]   12   14    6    6   13    4   12   10    8    14    10     9     6     8
## [3,]   10   15   10   12    7    6   10   11    8     7     8    10    14     9
## [4,]   10   16    9    4   10   13   10   10   15    15     6    15    12    13
## [5,]   15    9    7   10    5    9    9    5   14    11    10    12    13     7
##      [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
## [1,]    12    12    14    12    10     6    10    13     9     8    11    12
## [2,]    11    11     8     6     7     8    10     8    13    13    11     7
## [3,]     6     7     3    15     9     6    11    11     8    13    12     9
## [4,]     8     8     6     9    11     7    13    12    11    10     8     6
## [5,]     8     5     7     7    12    13     6    11     8     8    12     9
##      [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38]
## [1,]    10    13     9    13    13    11    10    11    11    14    10     7
## [2,]     7    11     6    12    11    14    12    12    10     8    11     8
## [3,]    10    15    10     8    11    13    11    12     7     9    10    11
## [4,]    15     9    10    11     9     6     7    11     9    16    15    14
## [5,]    10    12     6    11     7    10     8     8    10    12     6     8
##      [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50]
## [1,]     5    12    15     7     9    10     7     8    13     5     6     8
## [2,]    10     9     9     9    12    12     5    12    11     6    13    12
## [3,]    11    10    10     9     4     9     5     4    13    13    10    12
## [4,]    12    11    11     9     8     9    15    11    13    10    12    10
## [5,]    10    12     6     9    14    14     5    11    10     8     3     3
##      [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61] [,62]
## [1,]     7    11    11    12    10     4    12     6    13    10    11     7
## [2,]     8    15    13    13    11     9     4     7     8     8    13     8
## [3,]     7     8    13    14    10     7     7     8    12    14     7     7
## [4,]     8    12    11     9     7     8     9    11    11    11    12    12
## [5,]    12    11    13    12    15     9     8     7    10     7    13    13
##      [,63] [,64] [,65] [,66] [,67] [,68] [,69] [,70] [,71] [,72] [,73] [,74]
## [1,]    10    11    11    13    14    12    15    11    14    18    13     4
## [2,]     6    13    11    10    12    11     9    11     9     7    18    14
## [3,]    12     9     6     8     7     6    14     7     6     7     8     9
## [4,]     8     6     8    12    14     8    17     7     7    12     9    11
## [5,]    10     9    10     8    10     9    12    11    10     6    10    13
##      [,75] [,76] [,77] [,78] [,79] [,80] [,81] [,82] [,83] [,84] [,85] [,86]
## [1,]    12    17     7    15    17    10    10    10     8    11    13    10
## [2,]     8    10    10     5     8     5    10     9    13    15     9     9
## [3,]     7     6     6    12    18    12     9     7     8    10     8    12
## [4,]    13    13     8     8     7     8     7     7    18    10    12    10
## [5,]    11     6     8     8    11    15    12    17     7     9    14     3
##      [,87] [,88] [,89] [,90] [,91] [,92] [,93] [,94] [,95] [,96] [,97] [,98]
## [1,]    12    18     7     5    11     7     4    13    11    11     9     7
## [2,]    14    10     8    12     8    12    12    10    11    11     8    11
## [3,]    14     9    12     7     7     4    11     6    13     9     9    10
## [4,]     7     6     8     7    10     7    10    13    11    10    12    12
## [5,]    13    13     8    12    13    11     4     5    11    11    11     7
##      [,99] [,100]
## [1,]     7      6
## [2,]     7      6
## [3,]     7      9
## [4,]    13      9
## [5,]     7      6
| That's correct!

正しいです。

  |============================================================================             |  85%
| replicate() created a matrix, each column of which contains 5 random numbers generated from a
| Poisson distribution with mean 10. Now we can find the mean of each column in my_pois using the
| colMeans() function. Store the result in a variable called cm.

replicate() は各列が、平均が 10 のポアッソン分布をもつ、5 個の乱数を生成します。今度は、my_pois の各列の平均を、colMeans() を使って求め、cm という名前の変数に置いてください。

 cm <- colMeans(my_pois)
| That's a job well done!

よくできました。

  |==============================================================================           |  88%
| And let's take a look at the distribution of our column means by plotting a histogram with
| hist(cm).

今求めた列の平均を、ヒストグラム hist(cm) で見てみましょう。

hist(cm)

| You're the best!

最高です。

  |=================================================================================        |  91%
| Looks like our column means are almost normally distributed, right? That's the Central Limit
| Theorem at work, but that's a lesson for another day!

...

列平均は、ほぼ、正規分布していませんか。これは、中心極限定理を表しています。いつの日か、学ぶことがあるでしょう。



  |====================================================================================     |  94%
| All of the standard probability distributions are built into R, including exponential (rexp()),
| chi-squared (rchisq()), gamma (rgamma()), .... Well, you see the pattern.

...

R には、すべての標準的な確率分布関数が付属しています。exponential (rexp()、chi-squared (rchisq()), gamma (rgamma())、などです。パターンは見えますか。

  |======================================================================================   |  97%
| Simulation is practically a field of its own and we've only skimmed the surface of what's
| possible. I encourage you to explore these and other functions further on your own.

...

シミュレーションはそれ自身実用的な分野で、今回、表面的な部分を見てみました。他の関数についても、調べてみることをお薦めします。

  |=========================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: No

Selection: 2

Coursera.org の終了認定が欲しいですか。

2 を選んでください。何度か説明しましたから、省略します。

| Excellent work!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 0

素晴らしい。

課の最後までたどり着きました。メニューに戻るには、

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: `swirl` コースのリストへ.

選択:0 (SL が入力)

| Leaving swirl now. Type swirl() to resume.

swirl を終了します。再開するときは、swirl() と入力してください。

>

2.14. Dates and Times

この課では、日付と時刻を R でどのように扱われているかについて学びます。
和訳や解説は、課を選択するところから始めます。よくわからない方は、1.2 または、1.1 の最初の部分を復習してください。

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files        3: Sequences of Numbers    
 4: Vectors                    5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                      9: Functions               
10: lapply and sapply         11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times           15: Base Graphics           


Selection: 14

14: Dates and Times(第14課 日付と時刻)を選択しました。

  |                                                                                         |   0%

| R has a special way of representing dates and times, which can be helpful if you're working with
| data that show how something changes over time (i.e. time-series data) or if your data contain
| some other temporal information, like dates of birth.

...

R には日付と時刻を表す特別の方法がありますが、それは、(time-series data 時間軸をともなったデータなどで)日付のある期間での変化を見たり、誕生日などの、時に関する他の情報を含んでいるデータを扱うときに有効です。

  |==                                                                                       |   3%
| Dates are represented by the 'Date' class and times are represented by the 'POSIXct' and
| 'POSIXlt' classes. Internally, dates are stored as the number of days since 1970-01-01 and times
| are stored as either the number of seconds since 1970-01-01 (for 'POSIXct') or a list of
| seconds, minutes, hours, etc. (for 'POSIXlt').

...

日付は、‘Date’ という種類(class)で、時刻を表す種類(class)は ‘POSIXct’ と ‘POSIXlt’ と二つあります。内部的には、日付は、1970-01-01 からの日数で保存され、時刻は、1970-01-01 (‘POSIXct’ clss) からの秒数で表すか、seconds, minutes, hours, などからなる (‘POSIXlt’ class) リストで、保存されています。

  |=====                                                                                    |   6%
| Let's start by using d1 <- Sys.Date() to get the current date and store it in the variable d1.
| (That's the letter 'd' and the number 1.)

まず、d1 <- Sys.Date() で、現在の日付を、変数 d1 に保存しましょう。(‘d’ と 数 の 1 です。)

> d1 <- Sys.Date()

| That's the answer I was looking for.

そうです。それが、期待したものです。

  |=======                                                                                  |   8%
| Use the class() function to confirm d1 is a Date object.

class() 関数で、d1 が Date オブジェクトであることを確認してください。

> class(d1)
[1] "Date"

| Keep working like that and you'll get there!

その調子で良いですよ。

  |==========                                                                               |  11%
| We can use the unclass() function to see what d1 looks like internally. Try it out.

unclass() 関数を使って、d1 が内部的には、どのようなものなのか見ることができます。試してください。

> unclass(d1)
[1] 18364

| You're the best!

最高です。

  |============                                                                             |  14%
| That's the exact number of days since 1970-01-01!

...

それが、1970-01-01 からの日数です。

  |===============                                                                          |  17%
| However, if you print d1 to the console, you'll get today's date -- YEAR-MONTH-DAY. Give it a
| try.

しかし、d1 をコンソール(画面)に表示すると、今日の日付は、YEAR-MONTH-DAY(年-月-日)になります。

> d1
[1] "2020-04-12"

| Great job!

やりましたね。

  |=================                                                                        |  19%
| What if we need to reference a date prior to 1970-01-01? Create a variable d2 containing
| as.Date("1969-01-01").

1970-01-01 以前の日付が必要なときは、どうなると思いますか。as.Date(“1969-01-01”) を d2 という名前の変数に置いてください。

> d2 <- as.Date("1969-01-01")

| You are doing so well!

とてもうまくいっていますよ。

  |====================                                                                     |  22%
| Now use unclass() again to see what d2 looks like internally.

もう一度 unclass() を使って、d2 は内部的にはどうなっているか表示してください。

> unclass(d2)
[1] -365

| You are really on a roll!

どんどん学んでいますね。

  |======================                                                                   |  25%
| As you may have anticipated, you get a negative number. In this case, it's -365, since
| 1969-01-01 is exactly one calendar year (i.e. 365 days) BEFORE 1970-01-01.

...

想像した通りかと思いますが、負の数になりました。この場合は、1969-01-01 は、1970-01-01 から丁度一年(365日)前ですから、-365 となりました。

  |=========================                                                                |  28%
| Now, let's take a look at how R stores times. You can access the current date and time using the
| Sys.time() function with no arguments. Do this and store the result in a variable called t1.

では、R が、時間をどのように、保存しているのか見てみましょう。現在の、日付と時間を Sys.time() 関数で、引数をなにも入れないで得ることができます。結果を、t1 に保存してください。

> t1 <- Sys.time()

| Excellent work!

素晴らしい。

  |===========================                                                              |  31%
| View the contents of t1.

t1 を見てみましょう。

> t1
[1] "2020-04-12 09:22:52 JST"

| Keep up the great work!

その調子です。

  |==============================                                                           |  33%
| And check the class() of t1.

class() でオブジェクトの種類を見てください。

> class(t1)
[1] "POSIXct" "POSIXt" 

| That's correct!

正しいです。

  |================================                                                         |  36%
| As mentioned earlier, POSIXct is just one of two ways that R represents time information. (You
| can ignore the second value above, POSIXt, which just functions as a common language between
| POSIXct and POSIXlt.) Use unclass() to see what t1 looks like internally -- the (large) number
| of seconds since the beginning of 1970.

すでに説明したように、POSIXct は、R の二種類の時間表示のうちの一つです。(二番目は無視しましょう。POSIXt は、POSIXct と POSIXlt の両方を表すために使います。)unclass() を使って、t1 が内部的には、どのようになっているか見てみましょう。1970年からの非常に大きな秒数になります。

> unclass(t1)
[1] 1586650973

| Great job!

やりましたね。

  |===================================                                                      |  39%
| By default, Sys.time() returns an object of class POSIXct, but we can coerce the result to
| POSIXlt with as.POSIXlt(Sys.time()). Give it a try and store the result in t2.

初期値では、Sys.time() は、POSIXct のオブジェクトを返します。しかし、as.POSIXlt(Sys.time()) とすることで、POSIXlt に変更することができます。その結果を、t2 として保存してください。

> t2 <- as.POSIXlt(Sys.time())

| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |=====================================                                                    |  42%
| Check the class of t2.

t2 の種類(クラス)を確認しましょう。

> class(t2)
[1] "POSIXlt" "POSIXt" 

| That's correct!

正しいです。

  |========================================                                                 |  44%
| Now view its contents.

では、中身を見てみましょう。

> t2
[1] "2020-04-12 09:24:52 JST"

| You are amazing!

驚きです。

  |==========================================                                               |  47%
| The printed format of t2 is identical to that of t1. Now unclass() t2 to see how it is different
| internally.

得られた t2 の形式は、t1 のものと同じです。今度は、unclass() を使って、t2 が内部的にはどのようになっているか見てみましょう。

> unclass(t2)
$sec
[1] 52.63071

$min
[1] 24

$hour
[1] 9

$mday
[1] 12

$mon
[1] 3

$year
[1] 120

$wday
[1] 0

$yday
[1] 102

$isdst
[1] 0

$zone
[1] "JST"

$gmtoff
[1] 32400

attr(,"tzone")
[1] ""    "JST" "JDT"

| You are amazing!

驚きです。

  |============================================                                             |  50%
| t2, like all POSIXlt objects, is just a list of values that make up the date and time. Use
| str(unclass(t2)) to have a more compact view.

t2 は、他の POSIXlt オブジェクト t2 と同じように、日付と、時刻に関する値のリストからなっています。str(unclass(t2)) で、もっと簡潔な形式で見てみましょう。

> str(unclass(t2))
List of 11
 $ sec   : num 52.6
 $ min   : int 24
 $ hour  : int 9
 $ mday  : int 12
 $ mon   : int 3
 $ year  : int 120
 $ wday  : int 0
 $ yday  : int 102
 $ isdst : int 0
 $ zone  : chr "JST"
 $ gmtoff: int 32400
 - attr(*, "tzone")= chr [1:3] "" "JST" "JDT"

| Great job!

やりましたね。

  |===============================================                                          |  53%
| If, for example, we want just the minutes from the time stored in t2, we can access them with
| t2$min. Give it a try.

たとえば、t2 に保存されている、分(minutes)を見たいときは t2$min とします。試してください。

> t2$min
[1] 24

| All that hard work is paying off!

努力は報われます。

  |=================================================                                        |  56%
| Now that we have explored all three types of date and time objects, let's look at a few
| functions that extract useful information from any of these objects -- weekdays(), months(), and
| quarters().

...

これで、日付と時刻を表す、三つの種類全部を見ました。これらのオブジェクトから、情報を取り出すための、weekdays(), months(), quarters() といった、いくつかの関数を見てみましょう。

  |====================================================                                     |  58%
| The weekdays() function will return the day of week from any date or time object. Try it out on
| d1, which is the Date object that contains today's date.

weekdays() 関数は、日付や、時刻のオブジェクトが、何曜日かを教えてくれます。今日の日付が保存されている d1 で試してみましょう。

> weekdays(d1)
[1] "Sunday"
[日本語システム] "日曜日"

| Keep working like that and you'll get there!

その調子で良いですよ。

  |======================================================                                   |  61%
| The months() function also works on any date or time object. Try it on t1, which is the POSIXct
| object that contains the current time (well, it was the current time when you created it).

months() 関数は、日付や、時刻のオブジェクトについて使えます。t1 は POSIXct 現在の時刻を保存してある(正確には、これを作成した時刻ですが)オブジェクトでした。t1 について適用してください。

> months(t1)
[1] "April"
[日本語システム] "4月"

| Excellent job!

素晴らしい。

  |=========================================================                                |  64%
| The quarters() function returns the quarter of the year (Q1-Q4) from any date or time object.
| Try it on t2, which is the POSIXlt object that contains the time at which you created it.

quarters() 関数は、日付または時刻のオブジェクトから、一年の四半期 (Q1-Q4) を計算して返します。POSIXlt 形式のオブジェクトで、それを保存した時刻の情報を持っている t2 について試してください。

> quarters(t2)
[1] "Q2"

| Great job!

やりましたね。

  |===========================================================                              |  67%
| Often, the dates and times in a dataset will be in a format that R does not recognize. The
| strptime() function can be helpful in this situation.

...

日付や、時刻は、R が認識できない形式になっていることもあります。そのときは、strptime() 関数が助けとなります。

  |==============================================================                           |  69%
| strptime() converts character vectors to POSIXlt. In that sense, it is similar to as.POSIXlt(),
| except that the input doesn't have to be in a particular format (YYYY-MM-DD).

...

strptime() 関数は、文字列ベクトルと、POSIXlt 形式に変換します。その意味で、as.POSIXlt() と似た働きをしますが、YYYY-MM-DD といった、特定の形式である必要はありません。

  |================================================================                         |  72%
| To see how it works, store the following character string in a variable called t3: "October 17,
| 1986 08:24" (with the quotes).

どのように働くかみるために、二重引用符を付けて文字列ベクトルとして、“October 17, 1986 08:24” を t3 という名前の変数として保存してください。

> t3 <- "October 17, 1986 08:24"

| You are really on a roll!

どんどん学んでいますね。

  |===================================================================                      |  75%
| Now, use strptime(t3, "%B %d, %Y %H:%M") to help R convert our date/time object to a format that
| it understands. Assign the result to a new variable called t4. (You should pull up the
| documentation for strptime() if you'd like to know more about how it works.)

strptime(t3, “%B %d, %Y %H:%M”) として、R が date/time オブジェクトとして認識して変換できるように、してください。結果を、 t4 という名前で保存してください。(strptime() についてもっと詳しく知りたいときは、Help を読んでください。)

> t4 <- strptime(t3, "%B %d, %Y %H:%M")

| Excellent work!

素晴らしい。

  |=====================================================================                    |  78%
| Print the contents of t4.

t4 の中身を表示してください。

> t4
[1] "1986-10-17 08:24:00 UTC"
[日本語システム] NA

| That's the answer I was looking for.

そうです。それが、期待したものです。

注: strptime() はシステムが日本語か英語で挙動が異なります。swirl を RStudio.cloud(英語システム)で使っている場合は、同じ問題でかまいませんが、日本語システムの時は、あとで、t4 の値を求めると、NA となります。それでも、気にせず、先に進むことで問題ありません。“October 17, 1986 08:24” の部分を、t4 <- “1986年 10月 17日, 08:24” として、strptime(t4,“%Y年 %m月 %d日, %H: %M”) とすれば同じことが可能です。strptime(“1986年 10月 17日, 08:24”,“%Y年 %m月 %d日, %H: %M”) 等として確かめてください。なお、UTC は、協定世界時(Coordinated Universal Time)で、日本で使っていれば、JST 日本標準時になります。

  |========================================================================                 |  81%
| That's the format we've come to expect. Now, let's check its class().

それが、期待した形式です。class() で、オブジェクトの種類を確認してください。

> class(t4)
[1] "POSIXlt" "POSIXt" 

| You are doing so well!

とてもうまくいっていますよ。

  |==========================================================================               |  83%
| Finally, there are a number of operations that you can perform on dates and times, including
| arithmetic operations (+ and -) and comparisons (<, ==, etc.)

...

最後に、日付や時刻に対する数の演算 (+ と -) や、比較の (<, ==, など) について学びます。

  |=============================================================================            |  86%
| The variable t1 contains the time at which you created it (recall you used Sys.time()). Confirm
| that some time has passed since you created t1 by using the 'greater than' operator to compare
| it to the current time: Sys.time() > t1

変数 t1 は、Sys.time() を使ったときの時刻を保存してあります。それが、現在の時刻よりも小さいことを、Sys.time() > t1 で、確認しましょう。

> Sys.time() > t1
[1] TRUE

| Great job!

やりましたね。

  |===============================================================================          |  89%
| So we know that some time has passed, but how much? Try subtracting t1 from the current time
| using Sys.time() - t1. Don't forget the parentheses at the end of Sys.time(), since it is a
| function.

時間が経過したことがわかります。次に、Sys.time() - t1 で、現在の時刻から、t1 を引いてください。Sys.time() は関数ですから、最後に括弧を付けるのを忘れないでください。

> Sys.time() - t1
Time difference of 9.99543 mins

| You are quite good my friend!

いい具合ですね。


  |==================================================================================       |  92%
| The same line of thinking applies to addition and the other comparison operators. If you want
| more control over the units when finding the above difference in times, you can use difftime(),
| which allows you to specify a 'units' parameter.

...

同様にして、加法や、他の比較の作用素を使うこともできます。また、difftime() を使って、時間の差も計算することができ、‘units’(単位)を指定することもできます。

  |====================================================================================     |  94%
| Use difftime(Sys.time(), t1, units = 'days') to find the amount of time in DAYS that has passed
| since you created t1.

difftime(Sys.time(), t1, units = ‘days’) を使って、t1 を保存したときから、何日たったか調べてください。

> difftime(Sys.time(), t1, units = 'days')
Time difference of 0.007301148 days

| You are amazing!

驚きです。

  |=======================================================================================  |  97%
| In this lesson, you learned how to work with dates and times in R. While it is important to
| understand the basics, if you find yourself working with dates and times often, you may want to
| check out the lubridate package by Hadley Wickham.

...

この課では、日付や、時刻を R がどのように扱うかについて学びました。基本を学ぶことは、大切ですが、もし、日付や時刻を、頻繁に使うのであれば、Hadley Wickham による、lubridate package についても、調べてみると良いでしょう。

  |=========================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

Coursera.org の終了認定が欲しいですか。

1 を選んでください。何度か説明しましたから、省略します。

| Excellent work!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 0

素晴らしい。

課の最後までたどり着きました。メニューに戻るには、

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: `swirl` コースのリストへ.

選択:0 (SL が入力)

| Leaving swirl now. Type swirl() to resume.

swirl を終了します。再開するときは、swirl() と入力してください。

>

2.15. Base Graphics

この課では、R の Base(ベース関数(特別なパッケージを読み込まないで付属の関数))による、グラフについて学びます。R では、様々なグラフを描くことができますが、ほんの一部を学びます。最近は、初心者でも、ggplot2 Package を使って、学ぶことが推奨されていますが、きほんのき を学ぶことは、無駄では亡いと思います。
和訳や解説は、課を選択するところから始めます。よくわからない方は、1.2 または、1.1 の最初の部分を復習してください。

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files        3: Sequences of Numbers    
 4: Vectors                    5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                      9: Functions               
10: lapply and sapply         11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times           15: Base Graphics           


Selection: 15

15: Base Graphics(第15課 ベースのグラフィックス)を選択しました。

  |                                                                                         |   0%

| One of the greatest strengths of R, relative to other programming languages, is the ease with
| which we can create publication-quality graphics. In this lesson, you'll learn about base
| graphics in R.

...

R の一番の強さは、他のプログラム言語に比較して、すぐにでも出版可能な質のグラフィックを作ことが可能であることです。この課では、R のベースのグラフィックについて学びます。

  |==                                                                                       |   2%
| We do not cover the more advanced portions of graphics in R in this lesson. These include
| lattice, ggplot2 and ggvis.

...

この課では、グラフィックのより上のレベルのものは、扱いません。それらは、lattice, ggplot2, ggvis などの Package を使います。

  |====                                                                                     |   4%
| There is a school of thought that this approach is backwards, that we should teach ggplot2
| first. See http://varianceexplained.org/r/teach_ggplot2_to_beginners/ for an outline of this
| view.

...

このような扱い方は後ろ向きで、最初から、ggplot2 を教えるべきだという声がたくさんあります。このような見方については、http://varianceexplained.org/r/teach_ggplot2_to_beginners/ を参照してください。

  |======                                                                                   |   7%
| Load the included data frame cars with data(cars).

まず、データフレーム cars を読み込みましょう。

cars は、R に付属のデータフレームの一つです。

data(cars)
| Perseverance, that's the answer.

頑張りましたね。それが答えです。

  |========                                                                                 |   9%
| To fix ideas, we will work with simple data frames. Our main goal is to introduce various
| plotting functions and their arguments. All the output would look more interesting with larger,
| more complex data sets.

...

焦点をしぼるため、簡単な構造のデータフレームからはじめます。主目的は、いくつかの、plotting functions(プロット関数)とその引数だからです。より大きな、複雑なデータセットのほうが、得られるグラフがより興味深いものになるかもしれませんが。

  |==========                                                                               |  11%
| Pull up the help page for cars.

cars の Help を見てください。

> ?cars

| Excellent work!

素晴らしい。

  |============                                                                             |  13%
| As you can see in the help page, the cars data set has only two variables: speed and stopping
| distance. Note that the data is from the 1920s.

...

Help からわかるように、cars のデータは、二つの変数しか持っていません。speed(速度)と stopping distance(制動距離)です。データは、1920年のものです。

  |==============                                                                           |  15%
| Run head() on the cars data.

cars に対して head() 関数を適用しましょう。

head(cars)
| You got it!

その通りです。

  |===============                                                                          |  17%
| Before plotting, it is always a good idea to get a sense of the data. Key R commands for doing
| so include, dim(), names(), head(), tail() and summary().

...

プロットする前に、データについて知ることが大切です。そのためには、dim(), names(), head(), tail() や summary() を使います。

  |=================                                                                        |  20%
| Run the plot() command on the cars data frame.

cars データフレームに対して、plot() を適用しましょう。

plot(cars)

| You got it right!

その通りです。

  |===================                                                                      |  22%
| As always, R tries very hard to give you something sensible given the information that you have
| provided to it. First, R notes that the data frame you have given it has just two columns, so it
| assumes that you want to plot one column versus the other.

...

R は常に、与えられた情報から、受け取れる最大のことをしようとします。まず、R は、与えられたデータフレームが、二つしか列をもたないことを知り、一つのコラムと他のコラムとの関係をプロットするのだと仮定します。

  |=====================                                                                    |  24%
| Second, since we do not provide labels for either axis, R uses the names of the columns. Third,
| it creates axis tick marks at nice round numbers and labels them accordingly. Fourth, it uses
| the other defaults supplied in plot().

...

二番目として、どちらの軸にもラベルを指定していませんが、R は、列の名前を使っています。三番目としては、軸のメモリを、適当な切りの良い数字にしています。四番目として、plot() の他の引数の初期値を用いています。

  |=======================                                                                  |  26%
| We will now spend some time exploring plot, but many of the topics covered here will apply to
| most other R graphics functions. Note that 'plot' is short for scatterplot.

...

すこし、plot がどのような関数なのか調べてみましょう。ここで扱うトピックは、R の他のほとんどのグラフィック関数にも、適用されるものです。‘plot’ は、scatterplot(散布図)の略です。

  |=========================                                                                |  28%
| Look up the help page for plot().

plot() の Help を見てください。

> ?plot

| Great job!

やりましたね。

  |===========================                                                              |  30%
| The help page for plot() highlights the different arguments that the function can take. The two
| most important are x and y, the variables that will be plotted. For the next set of questions,
| include the argument names in your answers. That is, do not type plot(cars$speed, cars$dist),
| although that will work. Instead, use plot(x = cars$speed, y = cars$dist).

...

plot() の Help ページには、関数がとることのできる引数についての説明を中心にしています。最も重要なのは、x と y で、プロットのための変数です。このあとのいくつかの練習問題では、引数名も一緒に加えてください。すなわち、plot(cars\(speed, cars\)dist) とするのではなく、plot(x = cars\(speed, y = cars\)dist) と引数名も加えてください。

  |=============================                                                            |  33%
| Use plot() command to show speed on the x-axis and dist on the y-axis from the cars data frame.
| Use the form of the plot command in which vectors are explicitly passed in as arguments for x
| and y.

plot() 関数を使い、cars データフレームを用い、x 軸に speed、y 軸に dist をとってください。plot 関数に x や y の引数とするベクトルを明示的に渡してください。

plot(x = cars$speed, y = cars$dist)

| You are doing so well!

とてもうまくいっていますよ。

  |===============================                                                          |  35%
| Note that this produces a slightly different answer than plot(cars). In this case, R is not sure
| what you want to use as the labels on the axes, so it just uses the arguments which you pass in,
| data frame name and dollar signs included.

...

これによって、plot(cars) とは少し異なるグラフが生じます。この場合は、R は、各軸のラベルとして何を使いたいのかはっきりしないので、引数として、わたしたものを、データフレームの名前とドル記号を使って表示します。

  |=================================                                                        |  37%
| Note that there are other ways to call the plot command, i.e., using the "formula" interface.
| For example, we get a similar plot to the above with plot(dist ~ speed, cars). However, we will
| wait till later in the lesson before using the formula interface.

...

他にも、plot を使う方法として、“formula”(式)を使う方法があります。たとえば、同様のプロットは、plot(dist ~ speed, cars) とでも得ることができます。しかし、この方法は、式で指定する方法を学ぶときまで保留することにします。

  |===================================                                                      |  39%
| Use plot() command to show dist on the x-axis and speed on the y-axis from the cars data frame.
| This is the opposite of what we did above.

x 軸に、dist、y 軸に speed を cars のデータフレームをつかって、plot() でグラフを表示してください。これは、上での方法の逆になっています。

plot(x = cars$dist, y = cars$speed)

| You are doing so well!

とてもうまくいっていますよ。

  |=====================================                                                    |  41%
| It probably makes more sense for speed to go on the x-axis since stopping distance is a function
| of speed more than the other way around. So, for the rest of the questions in this portion of
| the lesson, always assign the arguments accordingly.

...

speed を x 軸にとる方が、おそらくより自然でしょう。制動距離は、スピードによって変化するもので、その逆ではありませんから。そこで、残りの問題においては、最初にとった、引数のようにしましょう。

  |=======================================                                                  |  43%
| In fact, you can assume that the answers to the next few questions are all of the form plot(x =
| cars$speed, y = cars$dist, ...) but with various arguments used in place of the ...

実際、次のいくつかの練習問題においては、plot(x = cars\(speed, y = cars\)dist, …) とし、… にいろいろな引数を使ってみましょう。

  |=========================================                                                |  46%
| Recreate the plot with the label of the x-axis set to "Speed".

もう一度、プロットを、x 軸に、“Speed” とラベルをつけて、作ってください。

plot(x = cars$speed, y = cars$dist, xlab = "Speed")

| Keep up the great work!

その調子です。

  |===========================================                                              |  48%
| Recreate the plot with the label of the y-axis set to "Stopping Distance".

さらに、もう一度プロットを、y 軸に、“Stopping Distance” とラベルをつけて、作ってください。

plot(x = cars$speed, y = cars$dist, ylab = "Stopping Distance")

| You got it!

その通りです。

  |============================================                                             |  50%
| Recreate the plot with "Speed" and "Stopping Distance" as axis labels.

今度は、それぞれの軸のラベルとして、“Speed” と “Stopping Distance” を付けてプロットしてください。

plot(x = cars$speed, y = cars$dist, xlab = "Speed", ylab = "Stopping Distance")

| Excellent work!

素晴らしい。

  |==============================================                                           |  52%
| The reason that plots(cars) worked at the beginning of the lesson was that R was smart enough to
| know that the first element (i.e., the first column) in cars should be assigned to the x
| argument and the second element to the y argument. To save on typing, the next set of answers
| will all be of the form, plot(cars, ...) with various arguments added.

...

最初に学んだように、plots(cars) によって、R は、賢くも、cars の第一成分(すなわち、第一列)を 引数 x とし、第二成分 を 引数 y にとりました。入力を省略するため、次のいくつかの解答は、plot(cars, …) として、いろいろな引数を付け加えてください。

  |================================================                                         |  54%
| For each question, we will only want one additional argument at a time. Of course, you can pass
| in more than one argument when doing a real project.

...

それぞれの練習問題では、引数は一つだけ、加えるものとします。実際には、むろん、同時に、いくつもの引数を加えることができます。

  |==================================================                                       |  57%
| Plot cars with a main title of "My Plot". Note that the argument for the main title is "main"
| not "title".

主タイトルとして “My Plot” を付け加えてください。主タイトル(main title)の引数は、“main” です。

plot(cars, main = "My Plot")

| You're the best!

最高です。

  |====================================================                                     |  59%
| Plot cars with a sub title of "My Plot Subtitle".

副タイトルを、“My Plot Subtitle” として、cars をプロットしてください。

plot(cars, sub = "My Plot Subtitle")

| You got it!

その通りです。

  |======================================================                                   |  61%
| The plot help page (?plot) only covers a small number of the many arguments that can be passed
| in to plot() and to other graphical functions. To begin to explore the many other options, look
| at ?par. Let's look at some of the more commonly used ones. Continue using plot(cars, ...) as
| the base answer to these questions.

...

plot の Help (?plot) は、plot() 関数や、他のグラフィック関数のたくさんの引数のほんの一部だけを表示しています。?par とすると、他のオプションについて、調べることができます。良く利用するいくつかの引数について、見てみましょう。今後も、plot(cars, …) を使ってください。

  |========================================================                                 |  63%
| Plot cars so that the plotted points are colored red. (Use col = 2 to achieve this effect.)

色の赤を使って、cars をプロットしてください。(col = 2 とします。)

plot(cars, col = 2)

| You are really on a roll!

どんどん学んでいますね。

  |==========================================================                               |  65%
| Plot cars while limiting the x-axis to 10 through 15.  (Use xlim = c(10, 15) to achieve this
| effect.)

x 軸を 10 から 15 に制限して、cars をプロットしてください。(xlim = c(10, 15) とします。)

plot(cars, xlim = c(10, 15))

| Excellent work!

素晴らしい。

  |============================================================                             |  67%
| You can also change the shape of the symbols in the plot. The help page for points (?points)
| provides the details.

...

plot する記号の形を変えることもできます。?points とすることで、Help を見ることができます。

  |==============================================================                           |  70%
| Plot cars using triangles.  (Use pch = 2 to achieve this effect.)

三角形を使って、cars をプロットしてください。(pch = 2 とします。)

plot(cars, pch = 2)

| You nailed it! Good job!

見事です。

  |================================================================                         |  72%
| Arguments like "col" and "pch" may not seem very intuitive. And that is because they aren't! So,
| many/most people use more modern packages, like ggplot2, for creating their graphics in R.

...

“col” や “pch” という引数は、あまり直観的にはみえないかもしれません。たしかに、そうです。そこで、多くの、または、ほとんどの人は、より現代的なパッケージである、ggplot2 を使って、R のグラフを作成しています。

  |==================================================================                       |  74%
| It is, however, useful to have an introduction to base graphics because many of the idioms in
| lattice and ggplot2 are modeled on them.

...

しかし、ベースのグラフィックを少し学んでおくことは、多くのことばが、lattice や、ggplot2 でも使われるので、有効です。

  |====================================================================                     |  76%
| Let's now look at some other functions in base graphics that may be useful, starting with
| boxplots.

...

ベースのグラフィック関数をいくつか見ていきましょう。boxplots からはじめます。

  |======================================================================                   |  78%
| Load the mtcars data frame.

mtcars データフレームを読み込んでください。

data(mtcars)
| You are amazing!

驚きです。

  |========================================================================                 |  80%
| Anytime that you load up a new data frame, you should explore it before using it. In the middle
| of a swirl lesson, just type play(). This temporarily suspends the lesson (without losing the
| work you have already done) and allows you to issue commands like dim(mtcars) and head(mtcars).
| Once you are done examining the data, just type nxt() and the lesson will pick up where it left
| off.

...

新たにデータフレームを読み込んだら、使い始める前に、まずはどのようなものかを調べてみるべきです。swirl の課の途中では、play() と入力します。すると、これまでの部分は、失うこと亡く、一時的に課が中断され、dim(mtcars) とか、head(mtcars) と試すことができます。データを調べたら、nxt() と入力することで、中断したところから、はじめることができます。

  |==========================================================================               |  83%
| Look up the help page for boxplot().

boxplot() の Help を見てください。

> ?boxplot

| All that hard work is paying off!

努力は報われます。

  |===========================================================================              |  85%
| Instead of adding data columns directly as input arguments, as we did with plot(), it is often
| handy to pass in the entire data frame. This is what the "data" argument in boxplot() allows.

...

plot() の場合のように、データの列を直接引数に加えるのではなく、データフレームの全体を引数としてわたすほうが便利です。boxplot() はそのように、“data” 引数を扱うことができます。

  |=============================================================================            |  87%
| boxplot(), like many R functions, also takes a "formula" argument, generally an expression with
| a tilde ("~") which indicates the relationship between the input variables. This allows you to
| enter something like mpg ~ cyl to plot the relationship between cyl (number of cylinders) on the
| x-axis and mpg (miles per gallon) on the y-axis.

...

boxplot() は、他の R の関数と同様、“formula”(式)を引数としてとることができます。一般的には、tilde 波線 (“~”) によって、入力変数の関係を示します。たとえば、mpg ~ cyl のように、cyl(シリンダー(cylinders)の数)を x 軸にとり、mpg(1ガロン毎に走行可能な距離)を y軸にとります。

  |===============================================================================          |  89%
| Use boxplot() with formula = mpg ~ cyl and data = mtcars to create a box plot.

boxplot() と、式 formula = mpg ~ cyl を、data = mtcars に適用して、箱ひげ図(box plot) を作成してください。

boxplot(formula = mpg ~ cyl, data = mtcars)

| You are really on a roll!

どんどん学んでいますね。 ``` |================================================================================= | 91% | The plot shows that mpg is much lower for cars with more cylinders. Note that we can use the | same set of arguments that we explored with plot() above to add axis labels, titles and so on.

_plot によって、シリンダーが多い車は、より mpg の値が小さくなることがわかります。plot() で学んだように、軸のラベル(axis labels)や、主タイトルなどを、加えることができます。_

|=================================================================================== | 93% | When looking at a single variable, histograms are a useful tool. hist() is the associated R | function. Like plot(), hist() is best used by just passing in a single vector.

_変数一つだけを見るときは、histogram(ヒストグラム、度数分布図)が便利です。hist() がその関数で、plot(), hist() と似ていますが、一変数をわたすときに、力を発揮します。_

|===================================================================================== | 96% | Use hist() with the vector mtcars\(mpg to create a histogram. ``` _hist() を使って、mtcars\)mpg のヒストグラムを作成してください。_

hist(mtcars$mpg)

| Excellent work!

素晴らしい。

  |=======================================================================================  |  98%
| In this lesson, you learned how to work with base graphics in R. The best place to go from here
| is to study the ggplot2 package. If you want to explore other elements of base graphics, then
| this web page (http://www.ling.upenn.edu/~joseff/rstudy/week4.html) provides a useful overview.

...

この課では、R のベースのグラフの働きについて学びました。次には、ggplot2 package に進むのが、最善でしょう。他のベースのグラフィックについては、ウェッブページ:http://www.ling.upenn.edu/~joseff/rstudy/week4.html で概要を学ぶことができます。

  |=========================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

Coursera.org の終了認定が欲しいですか。

1 を選んでください。何度か説明しましたから、省略します。

| You are quite good my friend!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 0

いい具合ですね。

課の最後までたどり着きました。メニューに戻るには、

コースを選択してください。swirl を終了するときは、0 を入力してください。

1: R によるプログラム
2: `swirl` コースのリストへ.

選択:0 (SL が入力)

| Leaving swirl now. Type swirl() to resume.

swirl を終了します。再開するときは、swirl() と入力してください。

>

これで、R Programming の 第1課から第15課まですべて終了しました。

3. Regression Models: The basics of regression modeling in R

3.1 この Swirl について

Team swirl 制作です。Coursera のコースで、利用することを目的に作成されたものである。基本情報は以下の通りです。

  • Intermediate(中級レベル): R による回帰モデルの基礎(The basics of regression modeling in R)
  • The swirl Course Network: Regression Models
  • swirl::install_course("Regression Models"):まず、Console に、この命令を入力し、インストールします。

Coursera のコースについては、JHU Data Science Lab のページ を参照してください。特に、Regression Models for Data Science in R については、Free Textbook Online が公開されています。このそれぞれの箇所から、Youtube Video にもリンクが付いています。Textbook は、少しずつ、Update されているようですが、Youtube Video は、現時点では、2015年度版のようです。

swirl の中で説明されていますが、このコースに付随する、スライドなどは、https://github.com/DataScienceSpecialization/courses に置かれています。Github に慣れていない方のために少し説明すると、このサイトからは、https://www.coursera.org/specializations/jhu-data-science にリンクが付いており、coursera のコースで活用されたことがわかります。
swirl だけですべて理解するのは、難しいかもしれませんが、ヒントも多く問題を解きならが、進めていくことは可能です。
coursera のコースでは、slide を使い、説明のビデオが付いています。Github のページの、ファイルのリストの上に、Clone or Download とあります。下に、Open in Desktop と、Download Zip とあります。Github を使っていて、Github Desktop の使い方も知っている方は、どちらも理解できると思いますが、Github について慣れていない方は、Download Zip を選んでください。すると、しばらくして、Download できます。解凍を自動で行う場合と、ダブルクリックでする場合とあると思いますが、いずれにしても、これらのファイルをみな取得することができます。この中の 07_RegressionModels がこの swirl に関連したものです。それを開けます。たとえば、一番上の、01_01_introduction を開けると、いくつかファイルがありますが、html とあるものを、ダブルクリックするか、Crome, InternetExplorer や、Safari などで開けば、slide を見ることができます。Rmd は、R Markdown ファイルの事ですが、これを、R Studio で開くと、どのように、このファイルが作成されたかわかります。ときどき、引用されますが、読まなくても、先に進むことができると思います。よくわからない場合は、skip() をしていってもよいでしょう。ネット上や、Free Book か何かで、少し、線形回帰の勉強をしてから、もう一度、挑戦すると、だんだん理解が深まると思います。

いろいろな勉強の仕方があります。自分にあった方法を見つけてください。

Contents(内容)

  1. Introduction - 導入
  2. Residuals - 残差
  3. Least Squares Estimation - 最小二乗法
  4. Residual Variation - 残差分散
  5. Introduction to Multivariable Regression - 重回帰入門
  6. MultiVar Examples - 重回帰分析例 1
  7. MultiVar Examples2 - 重回帰分析例 2
  8. MultiVar Examples3 - 重回帰分析例 3
  9. Residuals Diagnostics and Variation - 残差診断と分散
  10. Variance Inflation Factors - 分散拡大係数(VIF)
  11. Overfitting and Underfitting - 過学習(過剰適合)と未学習(学習不足)
  12. Binary Outcomes - 二値アウトカム
  13. Count Outcomes - 個数(有限離散)アウトカム

Note: Packages

  1. Estimation
  • 4.3 Lesson 3:Least Squares Estimation で、 R Studio を利用している場合は、途中で、Package manipulate を使うと、Interactive Graph を使えるので、R Studio の menu の Tools から、Install Pacakages を選択して、manipulate をインストールするか、Console で install.packages("manipulate") としてください。また、使えるようにするために、library(manipulate) を実行してから、swirl を開始します。
  • CRAN manipulate: Interactive Plots for RStudio
    • Interactive plotting functions for use within RStudio. The manipulate function accepts a plotting expression and a set of controls (e.g. slider, picker, checkbox, or button) which are used to dynamically change values within the expression. When a value is changed using its corresponding control the expression is automatically re-executed and the plot is redrawn.
  • Manual: manipulate
  1. car Package
  1. UsingR and HistData

以下、つまづきそうな箇所のみコメントを加えます。

3.2 Install から Lesson 1: Introduction

最初には、swirl を使うよという宣言 library(swirl) です。swirl が読み込まれます。

新しいことをはじめるときには、rm(list=ls()) で、Workspace に保存されているものが、ない状態にしておくとよいと思います。同じ名前の変数が残っていて、原因不明のエラーが起こることもあります。

> library(swirl)

| Hi! Type swirl() when you are ready to begin.

> swirl()

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you
| did then. If you are new, call yourself something unique.

What shall I call you? SL

| Would you like to continue with one of these lessons?

1: The R Programming Environment Data Manipulation
2: No. Let me start something new.

Selection: 2

| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: Regression Models
4: The R Programming Environment
5: Take me to the swirl course repository!

Selection: 3

| Please choose a lesson, or type 0 to return to course menu.

 1: Introduction
 2: Residuals
 3: Least Squares Estimation
 4: Residual Variation
 5: Introduction to Multivariable Regression
 6: MultiVar Examples
 7: MultiVar Examples2
 8: MultiVar Examples3
 9: Residuals Diagnostics and Variation
10: Variance Inflation Factors
11: Overfitting and Underfitting
12: Binary Outcomes
13: Count Outcomes

Selection: 1
  |                                                                               |   0%

| Introduction to Regression Models. (Slides for this and other Data Science courses may
| be found at github https://github.com/DataScienceSpecialization/courses if you want to
| use them. They must be downloaded as a zip file and viewed locally. This lesson
| corresponds to Regression_Models/01_01_introduction.)

...

回帰分析(Regression Analysis)をはじめた、Francis Galton の例からはじめます。

  |====                                                                           |   5%
| This is the first lesson on Regression Models. We'll begin with the concept of
| "regression toward the mean" and illustrate it with some pioneering work of the father
| of forensic science, Sir Francis Galton.

...

  |========                                                                       |  10%
| Sir Francis studied the relationship between heights of parents and their children.
| His work showed that parents who were taller than average had children who were also
| tall but closer to the average height. Similarly, parents who were shorter than
| average had children who were also shorter than average but less so than mom and dad.
| That is, they were closer to the average height. From one generation to the next the
| heights moved closer to the average or regressed toward the mean.

...

  |===========                                                                    |  14%
| For this lesson we'll use Sir Francis's parent/child height data which we've taken the
| liberty to load for you as the variable, galton. (Data is from John Verzani's website,
| http://wiener.math.csi.cuny.edu/UsingR/.) So let's get started!

...

  |===============                                                                |  19%
| Here is a plot of Galton's data, a set of 928 parent/child height pairs. Moms' and
| dads' heights were averaged together (after moms' heights were adjusted by a factor of
| 1.08). In our plot we used the R function "jitter" on the children's heights to
| highlight heights that occurred most frequently. The dark spots in each column rise
| from left to right suggesting that children's heights do depend on their parents'.
| Tall parents have tall children and short parents have short children.

...

  |===================                                                            |  24%
| Here we add a red (45 degree) line of slope 1 and intercept 0 to the plot. If children
| tended to be the same height as their parents, we would expect the data to vary evenly
| about this line. We see this isn't the case. On the left half of the plot we see a
| concentration of heights above the line, and on the right half we see the
| concentration below the line.

...

  |=======================                                                        |  29%
| Now we've added a blue regression line to the plot. This is the line which has the
| minimum variation of the data around it. (For theory see the slides.) Its slope is
| greater than zero indicating that parents' heights do affect their children's. The
| slope is also less than 1 as would have been the case if children tended to be the
| same height as their parents.

...

  |==========================                                                     |  33%
| Now's your chance to plot in R. Type "plot(child ~ parent, galton)" at the R prompt.
plot(child ~ parent, galton)

| You are doing so well!

  |==============================                                                 |  38%
| You'll notice that this plot looks a lot different than the original we displayed.
| Why? Many people are the same height to within measurement error, so points fall on
| top of one another. You can see that some circles appear darker than others. However,
| by using R's function "jitter" on the children's heights, we can spread out the data
| to simulate the measurement errors and make high frequency heights more visible.

...

  |==================================                                             |  43%
| Now it's your turn to try. Just type "plot(jitter(child,4) ~ parent,galton)" and see
| the magic.
plot(jitter(child,4) ~ parent,galton)

| Excellent work!

  |======================================                                         |  48%
| Now for the regression line. This is quite easy in R. The function lm (linear model)
| needs a "formula" and dataset. You can type "?formula" for more information, but, in
| simple terms, we just need to specify the dependent variable (children's heights) ~
| the independent variable (parents' heights).

...

  |=========================================                                      |  52%
| So generate the regression line and store it in the variable regrline. Type "regrline
| <- lm(child ~ parent, galton)"
regrline <- lm(child ~ parent, galton)
| Excellent job!

  |=============================================                                  |  57%
| Now add the regression line to the plot with "abline". Make the line wide and red for
| visibility. Type "abline(regrline, lwd=3, col='red')"
> abline(regrline, lwd=3, col='red')

| You got it right!

  |=================================================                              |  62%
| The regression line will have a slope and intercept which are estimated from data.
| Estimates are not exact. Their accuracy is gauged by theoretical techniques and
| expressed in terms of "standard error." You can use "summary(regrline)" to examine the
| Galton regression line. Do this now.
summary(regrline)
## 
## Call:
## lm(formula = child ~ parent, data = galton)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.8050 -1.3661  0.0487  1.6339  5.9264 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 23.94153    2.81088   8.517   <2e-16 ***
## parent       0.64629    0.04114  15.711   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.239 on 926 degrees of freedom
## Multiple R-squared:  0.2105, Adjusted R-squared:  0.2096 
## F-statistic: 246.8 on 1 and 926 DF,  p-value: < 2.2e-16
| Your dedication is inspiring!

これは、summary 概要ですが、まずは、この内容を理解することからはじめます。

  |=====================================================                          |  67%
| The slope of the line is the estimate of the coefficient, or multiplier, of "parent",
| the independent variable of our data (in this case, the parents' heights). From the
| output of "summary" what is the slope of the regression line?

1: 23.94153
2: .64629
3: .04114

Selection: 2

| You nailed it! Good job!

  |========================================================                       |  71%
| What is the standard error of the slope?

1: .64629
2: .04114
3: 23.94153

Selection: 2

| Nice work!

  |============================================================                   |  76%
| A coefficient will be within 2 standard errors of its estimate about 95% of the time.
| This means the slope of our regression is significantly different than either 0 or 1
| since (.64629) +/- (2*.04114) is near neither 0 nor 1.

...

  |================================================================               |  81%
| We're now adding two blue lines to indicate the means of the children's heights
| (horizontal) and the parents' (vertical). Note that these lines and the regression
| line all intersect in a point. Pretty cool, huh? We'll talk more about this in a later
| lesson. (Something you can look forward to.)

...

  |====================================================================           |  86%
| The slope of a line shows how much of a change in the vertical direction is produced
| by a change in the horizontal direction. So, parents "1 inch" above the mean in height
| tend to have children who are only .65 inches above the mean. The green triangle
| illustrates this point. From the mean, moving a "1 inch distance" horizontally to the
| right (increasing the parents' height) produces a ".65 inch" increase in the vertical
| direction (children's height).

...

  |=======================================================================        |  90%
| Similarly, parents who are 1 inch below average in height have children who are only
| .65 inches below average height. The purple triangle illustrates this. From the mean,
| moving a "1 inch distance" horizontally to the left (decreasing the parents' height)
| produces a ".65 inch" decrease in the vertical direction (children's height).

...

  |===========================================================================    |  95%
| This concludes our lesson on regression toward the mean. We hope you found it above
| average!

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: No

Selection: 2

| Nice work!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 1 の終了です。

3.3 Lesson 2: Residuals

線形回帰で得られる結果の意味について、学びます。まずは、Residuals 残差です。

回帰直線(regression line)の方程式。mean(平均)、sd(standard deviation 標準偏差)、cov (covariance 共分散)。 \[y = a + bx + \epsilon,\] \[b = \frac{\sum_{i=1}^n (x_i-\bar{x})(y_i - \bar{y})}{\sum_{i=1}^n(x_i - \bar{x})^2} = \frac{\mbox{cov}(y, x)\mbox{sd}(y)}{\mbox{sd}(x)}, \] \[\mbox{cov}(y, x) = \frac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})(y_i - \bar{y}),\: \mbox{sd}(x) = \sqrt{\frac{1}{n-1}\sum_{i=1}^n (x-\bar{x})^2}\] \(a = \bar{y}-b\), \(\bar{y} = \mbox{mean}(y)\)

fit <- lm(y ~ x) とすると、 * fit$coef[1]\(=a\): \(y\)-切片(Intercept) * fit$coef[2]\(=b\): 傾き(Slope) * \(\hat{y} = a + bx\) * \(\epsilon = y - \hat{y} = \mbox{fit\$residuals}\): 残差

\[Var(\hat{Y} - Y) = Var(\hat{Y}) + Var(Y)\]

| Would you like to continue with one of these lessons?

1: The R Programming Environment Data Manipulation
2: No. Let me start something new.

Selection: 2

| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: Regression Models
4: The R Programming Environment
5: Take me to the swirl course repository!

Selection: 3

| Please choose a lesson, or type 0 to return to course menu.

 1: Introduction
 2: Residuals
 3: Least Squares Estimation
 4: Residual Variation
 5: Introduction to Multivariable Regression
 6: MultiVar Examples
 7: MultiVar Examples2
 8: MultiVar Examples3
 9: Residuals Diagnostics and Variation
10: Variance Inflation Factors
11: Overfitting and Underfitting
12: Binary Outcomes
13: Count Outcomes

Selection: 2

  |                                                                               |   0%

| Residuals. (Slides for this and other Data Science courses may be found at github
| https://github.com/DataScienceSpecialization/courses. If you care to use them, they
| must be downloaded as a zip file and viewed locally. This lesson corresponds to
| Regression_Models/01_03_ols. Galton data is from John Verzani's website,
| http://wiener.math.csi.cuny.edu/UsingR/)

...

  |==                                                                             |   3%
| This lesson will focus on the residuals, the distances between the actual children's
| heights and the estimates given by the regression line. Since all lines are
| characterized by two parameters, a slope and an intercept, we'll use the least squares
| criteria to provide two equations in two unknowns so we can solve for these
| parameters, the slope and intercept.

...

  |=====                                                                          |   6%
| The first equation says that the "errors" in our estimates, the residuals, have mean
| zero. In other words, the residuals are "balanced" among the data points; they're just
| as likely to be positive as negative. The second equation says that our residuals must
| be uncorrelated with our predictors, the parents’ height. This makes sense - if the
| residuals and predictors were correlated then you could make a better prediction and
| reduce the distances (residuals) between the actual outcomes and the predictions.

...

  |=======                                                                        |   9%
| We'll demonstrate these concepts now. First regenerate the regression line and call it
| fit. Use the R function lm. Recall that by default its first argument is a formula
| such as "child ~ parent" and its second is the dataset, in this case galton.
fit <- lm(child ~ parent, data = galton)
| You got it right!

  |==========                                                                     |  12%
| Now we'll examine fit to see its slope and intercept. The residuals we're interested
| in are stored in the 928-long vector fit$residuals. If you type fit$residuals you'll
| see a lot of numbers scroll by which isn't very useful; however if you type
| "summary(fit)" you will see a more concise display of the regression data. Do this
| now.
summary(fit)
## 
## Call:
## lm(formula = child ~ parent, data = galton)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.8050 -1.3661  0.0487  1.6339  5.9264 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 23.94153    2.81088   8.517   <2e-16 ***
## parent       0.64629    0.04114  15.711   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.239 on 926 degrees of freedom
## Multiple R-squared:  0.2105, Adjusted R-squared:  0.2096 
## F-statistic: 246.8 on 1 and 926 DF,  p-value: < 2.2e-16
| You nailed it! Good job!

  |============                                                                   |  16%
| First check the mean of fit$residuals to see if it's close to 0.
mean(fit$residuals)
## [1] -2.359884e-15
| Nice work!

  |===============                                                                |  19%
| Now check the correlation between the residuals and the predictors. Type
| "cov(fit$residuals, galton$parent)" to see if it's close to 0.
cov(fit$residuals, galton$parent)
## [1] -1.790153e-13
| All that practice is paying off!

  |=================                                                              |  22%
| As shown algebraically in the slides, the equations for the intercept and slope are
| found by supposing a change is made to the intercept and slope. Squaring out the
| resulting expressions produces three summations. The first sum is the original term
| squared, before the slope and intercept were changed. The third sum totals the squared
| changes themselves. For instance, if we had changed fit’s intercept by adding 2, the
| third sum would be the total of 928 4’s. The middle sum is guaranteed to be zero
| precisely when the two equations (the conditions on the residuals) are satisfied.

...

  |====================                                                           |  25%
| We'll verify these claims now. We've defined for you two R functions, est and sqe.
| Both take two inputs, a slope and an intercept. The function est calculates a child's
| height (y-coordinate) using the line defined by the two parameters, (slope and
| intercept), and the parents' heights in the Galton data as x-coordinates.

...

  |======================                                                         |  28%
| Let "mch" represent the mean of the galton childrens' heights and "mph" the mean of
| the galton parents' heights. Let "ic" and "slope" represent the intercept and slope of
| the regression line respectively. As shown in the slides and past lessons, the point
| (mph,mch) lies on the regression line. This means

1: mph = ic + slope*mch
2: mch = ic + slope*mph
3: I haven't the slightest idea.

Selection: 3

| Nice try, but that's not exactly what I was hoping for. Try again.

| A line is the set of all points (x,y) satisfying the equation y = mx + b, where m is
| the slope of the line and b is its intercept. Remember that the point (mph,mch) lies
| on the regression line with intercept ic and slope "slope".

1: mph = ic + slope*mch
2: mch = ic + slope*mph
3: I haven't the slightest idea.

Selection: 2

| You are quite good my friend!

  |=========================                                                      |  31%
| The function sqe calculates the sum of the squared residuals, the differences between
| the actual children's heights and the estimated heights specified by the line defined
| by the given parameters (slope and intercept).  R provides the function deviance to do
| exactly this using a fitted model (e.g., fit) as its argument. However, we provide sqe
| because we'll use it to test regression lines different from fit.

...

  |===========================                                                    |  34%
| We'll see that when we vary or tweak the slope and intercept values of the regression
| line which are stored in fit$coef, the resulting squared residuals are approximately
| equal to the sum of two sums of squares - that of the original regression residuals
| and that of the tweaks themselves. More precisely, up to numerical error,

...

  |==============================                                                 |  38%
| sqe(ols.slope+sl,ols.intercept+ic) == deviance(fit) + sum(est(sl,ic)ˆ2 )

...

  |================================                                               |  41%
| Equivalently, sqe(ols.slope+sl,ols.intercept+ic) == sqe(ols.slope, ols.intercept) +
| sum(est(sl,ic)ˆ2 )

...

  |===================================                                            |  44%
| The left side of the equation represents the squared residuals of a new line, the
| "tweaked" regression line. The terms "sl" and "ic" represent the variations in the
| slope and intercept respectively. The right side has two terms. The first represents
| the squared residuals of the original regression line and the second is the sum of
| squares of the variations themselves.

...

  |=====================================                                          |  47%
| We'll demonstrate this now. First extract the intercept from fit$coef and put it in a
| variable called ols.ic . The intercept is the first element in the fit$coef vector,
| that is fit$coef[1].
ols.ic <- fit$coef[1]
| That's correct!

  |========================================                                       |  50%
| Now extract the slope from fit$coef and put it in the variable ols.slope; the slope is
| the second element in the fit$coef vector, fit$coef[2].
ols.slope <- fit$coef[2]
| All that practice is paying off!

  |==========================================                                     |  53%
| Now we'll show you some R code which generates the left and right sides of this
| equation.  Take a moment to look it over. We've formed two 6-long vectors of
| variations, one for the slope and one for the intercept. Then we have two "for" loops
| to generate the two sides of the equation.

...

res_eqn.R の内容

#Here are the vectors of variations or tweaks
sltweak <- c(.01, .02, .03, -.01, -.02, -.03) #one for the slope
ictweak <- c(.1, .2, .3, -.1, -.2, -.3)  #one for the intercept
lhs <- numeric()
rhs <- numeric()
#left side of eqn is the sum of squares of residuals of the tweaked regression line
for (n in 1:6) lhs[n] <- sqe(ols.slope+sltweak[n],ols.ic+ictweak[n])
#right side of eqn is the sum of squares of original residuals + sum of squares of two tweaks
for (n in 1:6) rhs[n] <- sqe(ols.slope,ols.ic) + sum(est(sltweak[n],ictweak[n])^2)

… ですから、Enter で先に進みます。

  |============================================                                   |  56%
| Subtract the right side, the vector rhs, from the left, the vector lhs, to see the
| relationship between them. You should get a vector of very small, almost 0, numbers.
> lhs - rhs
[1]  1.264198e-09  2.527486e-09  3.801688e-09 -1.261469e-09 -2.522938e-09 -3.767127e-09

| You got it!

  |===============================================                                |  59%
| You could also use the R function all.equal with lhs and rhs as arguments to test for
| equality. Try it now.

> all.equal(lhs, rhs)
[1] TRUE

| Excellent job!

  |=================================================                              |  62%
| Now we'll show that the variance in the children's heights is the sum of the variance
| in the OLS estimates and the variance in the OLS residuals. First use the R function
| var to calculate the variance in the children's heights and store it in the variable
| varChild.
varChild <- var(galton$child)
| That's a job well done!

  |====================================================                           |  66%
| Remember that we've calculated the residuals and they're stored in fit$residuals. Use
| the R function var to calculate the variance in these residuals now and store it in
| the variable varRes.
varRes <- var(fit$residuals)
| That's a job well done!

  |======================================================                         |  69%
| Recall that the function "est" calculates the estimates (y-coordinates) of values
| along the regression line defined by the variables "ols.slope" and "ols.ic". Compute
| the variance in the estimates and store it in the variable varEst.

> varEst <- var(est(ols.slope, ols.ic))

| You are amazing!

  |=========================================================                      |  72%
| Now use the function all.equal to compare varChild and the sum of varRes and varEst.

> all.equal(varChild, varRes + varEst)
[1] TRUE

| All that practice is paying off!

  |===========================================================                    |  75%
| Since variances are sums of squares (and hence always positive), this equation which
| we've just demonstrated, var(data)=var(estimate)+var(residuals), shows that the
| variance of the estimate is ALWAYS less than the variance of the data.

...

  |==============================================================                 |  78%
| Since var(data)=var(estimate)+var(residuals) and variances are always positive, the
| variance of residuals

1: is greater than the variance of data
2: is less than the variance of data
3: is unknown without actual data

Selection: 2

| Nice work!

以下、attenu dataset (The Joyner-Boore Attenuation Data)を利用するが、このデータは、Built-in datasets に含まれている。attenu:基本情報参照

  |================================================================               |  81%
| The two properties of the residuals we've emphasized here can be applied to datasets
| which have multiple predictors. In this lesson we've loaded the dataset attenu which
| gives data for 23 earthquakes in California. Accelerations are estimated based on two
| predictors, distance and magnitude.

...

  |===================================================================            |  84%
| Generate the regression line for this data. Type efit <- lm(accel ~ mag+dist, attenu)
| at the R prompt.
efit <- lm(accel ~ mag+dist, attenu)
| All that hard work is paying off!

  |=====================================================================          |  88%
| Verify the mean of the residuals is 0.
mean(efit$residuals)
| You are really on a roll!

  |========================================================================       |  91%
| Using the R function cov verify the residuals are uncorrelated with the magnitude
| predictor, attenu$mag.
cov(efit$residuals, attenu$mag)
## [1] 5.338694e-17
| Perseverance, that's the answer.

  |==========================================================================     |  94%
| Using the R function cov verify the residuals are uncorrelated with the distance
| predictor, attenu$dist.
cov(efit$residuals, attenu$dist)
## [1] 5.253433e-16
| That's a job well done!

  |=============================================================================  |  97%
| Congrats! You've finished the course on Residuals. We hope it hasn't left a bad taste
| in your mouth.

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: No

Selection: 2

| Excellent work!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 2 の終了です。

3.4 Lesson 3: Least Squares Estimation

回帰直線の方程式は実際の値との差の二乗の和が一番小さくなるようにして決めています。この方法を、最小二乗法と言います。

R Studio を利用している場合は、manipulate のインストールと、library(manipulate) の実行を確認してから、swirl を開始します。


| Would you like to continue with one of these lessons?

1: The R Programming Environment Data Manipulation
2: No. Let me start something new.

Selection: 2

| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: Regression Models
4: The R Programming Environment
5: Take me to the swirl course repository!

Selection: 3

| Please choose a lesson, or type 0 to return to course menu.

 1: Introduction
 2: Residuals
 3: Least Squares Estimation
 4: Residual Variation
 5: Introduction to Multivariable Regression
 6: MultiVar Examples
 7: MultiVar Examples2
 8: MultiVar Examples3
 9: Residuals Diagnostics and Variation
10: Variance Inflation Factors
11: Overfitting and Underfitting
12: Binary Outcomes
13: Count Outcomes

Selection: 3
  |                                                                               |   0%

| Least Squares Estimation. (Slides for this and other Data Science courses may be found
| at github https://github.com/DataScienceSpecialization/courses. If you care to use
| them, they must be downloaded as a zip file and viewed locally. This lesson
| corresponds to Regression_Models/01_03_ols. Galton data is from John Verzani's
| website, http://wiener.math.csi.cuny.edu/UsingR/)

...

  |====                                                                           |   5%
| In this lesson, if you're using RStudio, you'll be able to play with some of the code
| which appears in the slides. If you're not using RStudio, you can look at the code but
| you won't be able to experiment with the function "manipulate". We provide the code
| for you so you can examine it without having to type it all out.  In RStudio, when the
| edit window displays code, make sure your flashing cursor is back in the console
| window before you hit "Enter" or any keyboard buttons, otherwise you might
| accidentally alter the code. If you do alter the file, in RStudio, you can hit Ctrl z
| in the editor until all the unwanted changes disappear. In other editors, you'll have
| to use whatever key combination performs "undo" to remove all your unwanted changes.

...

  |========                                                                       |  11%
| Here are the Galton data and the regression line seen in the Introduction. The
| regression line summarizes the relationship between parents' heights (the predictors)
| and their children's (the outcomes).

...

  |============                                                                   |  16%
| We learned in the last lesson that the regression line is the line through the data
| which has the minimum (least) squared "error", the vertical distance between the 928
| actual children's heights and the heights predicted by the line. Squaring the
| distances ensures that data points above and below the line are treated the same. This
| method of choosing the 'best' regression line (or 'fitting' a line to the data) is
| known as ordinary least squares.

...

  |=================                                                              |  21%
| As shown in the slides, the regression line contains the point representing the means
| of the two sets of heights. These are shown by the thin horizontal and vertical lines.
| The intersection point is shown by the triangle on the plot. Its x-coordinate is the
| mean of the parents' heights and y-coordinate is the mean of the childrens' heights.

...

  |=====================                                                          |  26%
| As shown in the slides, the slope of the regression line is the correlation between
| the two sets of heights multiplied by the ratio of the standard deviations (childrens'
| to parents' or outcomes to predictors).

...

  |=========================                                                      |  32%
| Here we show code which demonstrates how changing the slope of the regression line
| affects the mean squared error between actual and predicted values. Look it over to
| see how straightforward it is.

...

slope_demo_nomani.R

myPlot <- function(beta){
  y <- galton$child - mean(galton$child)
  x <- galton$parent - mean(galton$parent)
  freqData <- as.data.frame(table(x, y))
  names(freqData) <- c("child", "parent", "freq")
  plot(
    as.numeric(as.vector(freqData$parent)), 
    as.numeric(as.vector(freqData$child)),
    pch = 21, col = "black", bg = "lightblue",
    cex = .15 * freqData$freq, 
    xlab = "parent", 
    ylab = "child"
  )
  abline(0, beta, lwd = 3)
  points(0, 0, cex = 2, pch = 19)
  mse <- mean( (y - beta * x)^2 )
  title(paste("beta = ", beta, "mse = ", round(mse, 3)))
}
#manipulate(myPlot(beta), beta = slider(0.4, .8, step = 0.02))

実際に動かすには、最後の行の、# を消して、実行します。

  |=============================                                                  |  37%
| What RStudio graphics package allows the user to play with the data to see the effects
| of the changes?

1: manipulate
2: abline
3: plot
4: points

Selection: 1

| You got it!

  |=================================                                              |  42%
| Now you can actually play with the code to use R's manipulate function and find the
| minimum squared error. You can adjust the slider with the left mouse button or use the
| right and left arrow keys to see how changing the slope (beta) affects the mean
| squared error (mse). If the slider disappears you can call it back by clicking on the
| little gear in the upper left corner of the plot window.

...

  |=====================================                                          |  47%
| Which value of the slope minimizes the mean squared error?

1: .44
2: .64
3: .70
4: 5

Selection: 2

| You got it right!

  |==========================================                                     |  53%
| What was the minimum mse?

1: .64
2: 5.0
3: .66
4: 44

Selection: install.packages("manipulate")
Enter an item from the menu, or 0 to exit
Selection: 0

| You're close...I can feel it! Try it again.

| You don't want an error that's too big or too small.

1: 44
2: 5.0
3: .64
4: .66

Selection: 2

| You got it right!

  |==============================================                                 |  58%
| Recall that you normalize data by subtracting its mean and dividing by its standard
| deviation. We've done this for the galton child and parent data for you. We've stored
| these normalized values in two vectors, gpa_nor and gch_nor, the normalized galton
| parent and child data.

...

  |==================================================                             |  63%
| Use R's function "cor" to compute the correlation between these normalized data sets.

> cor(gpa_nor, gch_nor)
[1] 0.4587624

| Excellent job!

  |======================================================                         |  68%
| How does this correlation relate to the correlation of the unnormalized data?

1: It is smaller.
2: It is bigger.
3: It is the same.

Selection: 3

| That's a job well done!

  |==========================================================                     |  74%
| Use R's function "lm" to generate the regression line using this normalized data.
| Store it in a variable called l_nor. Use the parents' heights as the predictors
| (independent variable) and the childrens' as the predicted (dependent). Remember, 'lm'
| needs a formula of the form dependent ~ independent. Since we've created the data
| vectors for you there's no need to provide a second "data" argument as you have
| previously.

> l_nor <- lm(gch_nor, gpa_nor)
 formula.default(object, env = baseenv()) でエラー:  無効なモデル式です 
> l_nor <- lm(gch_nor ~ gpa_nor)

| Perseverance, that's the answer.

  |==============================================================                 |  79%
| What is the slope of this line?

1: I have no idea
2: The correlation of the 2 data sets
3: 1.

Selection: play()
Enter an item from the menu, or 0 to exit
Selection: 1

| Not quite right, but keep trying.

| Think correlation.

1: The correlation of the 2 data sets
2: I have no idea
3: 1.

Selection: 1

| That's the answer I was looking for.

  |===================================================================            |  84%
| If you swapped the outcome (Y) and predictor (X) of your original (unnormalized) data,
| (for example, used childrens' heights to predict their parents), what would the slope
| of the new regression line be?

1: I have no idea
2: correlation(X,Y) * sd(X)/sd(Y)
3: 1.
4: the same as the original

Selection: 4

| That's not the answer I was looking for, but try again.

| Since you're swapping X and Y, swap the X and Y in the formula. Swapping X and Y in
| the correlation function doesn't change anything.

1: correlation(X,Y) * sd(X)/sd(Y)
2: I have no idea
3: the same as the original
4: 1.

Selection: 1

| You are doing so well!

  |=======================================================================        |  89%
| We'll close with a final display of source code from the slides. It plots the galton
| data with three regression lines, the original in red with the children as the
| outcome, a new blue line with the parents' as outcome and childrens' as predictor, and
| a black line with the slope scaled so it equals the ratio of the standard deviations.

...

finalplot.R

#plot the original Galton data points with larger dots for more freq pts
y <- galton$child
x <- galton$parent
freqData <- as.data.frame(table(galton$child, galton$parent))
names(freqData) <- c("child", "parent", "freq")
plot(as.numeric(as.vector(freqData$parent)), 
     as.numeric(as.vector(freqData$child)), 
     pch = 21, col = "black", bg = "lightblue",
     cex = .07 * freqData$freq, xlab = "parent", ylab = "child")

#original regression line, children as outcome, parents as predictor
abline(mean(y) - mean(x) * cor(y, x) * sd(y) / sd(x), #intercept
       sd(y) / sd(x) * cor(y, x),  #slope
       lwd = 3, col = "red")

#new regression line, parents as outcome, children as predictor
abline(mean(y) - mean(x) * sd(y) / sd(x) / cor(y, x), #intercept
       sd(y) / cor(y, x) / sd(x), #slope
       lwd = 3, col = "blue")

#assume correlation is 1 so slope is ratio of std deviations
abline(mean(y) - mean(x) * sd(y) / sd(x), #intercept
       sd(y) / sd(x),  #slope
       lwd = 2)
points(mean(x), mean(y), cex = 2, pch = 19) #big point of intersection
  |===========================================================================    |  95%
| Congrats! You've concluded this lesson on ordinary least squares which are truly
| extraordinary!

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

| You nailed it! Good job!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 3 の終了です。

3.5 Lesson 4: Residual Variation

残差(実際の測定値と、線形回帰モデルで得られた一次関数で予測したものとの差)の分散(variance)について学びます。

| Would you like to continue with one of these lessons?

1: R Programming Basic Building Blocks
2: The R Programming Environment Data Manipulation
3: No. Let me start something new.

Selection: 3

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Regression Models
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 2

| Please choose a lesson, or type 0 to return to course menu.

 1: Introduction
 2: Residuals
 3: Least Squares Estimation
 4: Residual Variation
 5: Introduction to Multivariable Regression
 6: MultiVar Examples
 7: MultiVar Examples2
 8: MultiVar Examples3
 9: Residuals Diagnostics and Variation
10: Variance Inflation Factors
11: Overfitting and Underfitting
12: Binary Outcomes
13: Count Outcomes

Selection: 4
  |                                                                               |   0%

| Residual Variation. (Slides for this and other Data Science courses may be found at
| github https://github.com/DataScienceSpecialization/courses. If you care to use them,
| they must be downloaded as a zip file and viewed locally. This lesson corresponds to
| Regression_Models/01_06_residualVariation. Galton data is from John Verzani's website,
| http://wiener.math.csi.cuny.edu/UsingR/)

...

  |====                                                                           |   5%
| As shown in the slides, residuals are useful for indicating how well data points fit a
| statistical model. They "can be thought of as the outcome (Y) with the linear
| association of the predictor (X) removed. One differentiates residual variation
| (variation after removing the predictor) from systematic variation (variation
| explained by the regression model)."

...

  |=======                                                                        |   9%
| It can also be shown that, given a model, the maximum likelihood estimate of the
| variance of the random error is the average squared residual. However, since our
| linear model with one predictor requires two parameters we have only (n-2) degrees of
| freedom. Therefore, to calculate an "average" squared residual to estimate the
| variance we use the formula 1/(n-2) * (the sum of the squared residuals). If we
| divided the sum of the squared residuals by n, instead of n-2, the result would give a
| biased estimate.

...

  |===========                                                                    |  14%
| To see this we'll use our favorite Galton height data. First regenerate the regression
| line and call it fit. Use the R function lm and recall that by default its first
| argument is a formula such as "child ~ parent" and its second is the dataset, in this
| case galton.
fit <- lm(child ~ parent, data = galton)
| That's correct!

  |==============                                                                 |  18%
| First, we'll use the residuals (fit$residuals) of our model to estimate the standard
| deviation (sigma) of the error. We've already defined n for you as the number of
| points in Galton's dataset (928).

...

  |==================                                                             |  23%
| Calculate the sum of the squared residuals divided by the quantity (n-2).  Then take
| the square root.
sqrt(sum(fit$residuals^2)/(n-2))
## [1] 2.238547
| Your dedication is inspiring!

  |======================                                                         |  27%
| Now look at the "sigma" portion of the summary of fit, "summary(fit)$sigma".
summary(fit)$sigma
## [1] 2.238547
| Keep up the great work!

  |=========================                                                      |  32%
| Pretty cool, huh?

...

  |=============================                                                  |  36%
| Another cool thing - take the sqrt of "deviance(fit)/(n-2)" at the R prompt.
sqrt(deviance(fit)/(n-2))
## [1] 2.238547
| You're the best!

  |================================                                               |  41%
| Another useful fact shown in the slides was

...

  |====================================                                           |  45%
| Total Variation = Residual Variation + Regression Variation

...

  |========================================                                       |  50%
| Recall the beauty of the slide full of algebra which proved this fact. It had a bunch
| of Y's, some with hats and some with bars and several summations of squared values.
| The Y's with hats were the estimates provided by the model. (They were on the
| regression line.) The Y with the bar was the mean or average of the data. Which sum of
| squared term represented Total Variation?

1: Yi-mean(Yi)
2: Yi_hat-mean(Yi)
3: Yi-Yi_hat

Selection: 1

| You are really on a roll!

  |===========================================                                    |  55%
| Which sum of squared term represents Residual Variation?

1: Yi-Yi_hat
2: Yi_hat-mean(Yi)
3: Yi-mean(Yi)

Selection: 1

| That's a job well done!

  |===============================================                                |  59%
| The term R^2 represents the percent of total variation described by the model, the
| regression variation (the term we didn't ask about in the preceding multiple choice
| questions). Also, since it is a percent we need a ratio or fraction of sums of
| squares. Let's do this now for our Galton data.

...

  |==================================================                             |  64%
| We'll start with easy steps. Calculate the mean of the children's heights and store it
| in a variable called mu. Recall that we reference the childrens' heights with the
| expression 'galton$child' and the parents' heights with the expression
| 'galton$parent'.
mu <- mean(galton$child)
| Your dedication is inspiring!

  |======================================================                         |  68%
| Recall that centering data means subtracting the mean from each data point. Now
| calculate the sum of the squares of the centered children's heights and store the
| result in a variable called sTot. This represents the Total Variation of the data.
sTot <- sum((galton$child - mu)^2)
| You are doing so well!

  |=========================================================                      |  73%
| Now create the variable sRes. Use the R function deviance to calculate the sum of the
| squares of the residuals. These are the distances between the children's heights and
| the regression line. This represents the Residual Variation.
sRes <- deviance(fit)
| You nailed it! Good job!

  |=============================================================                  |  77%
| Finally, the ratio sRes/sTot represents the percent of total variation contributed by
| the residuals. To find the percent contributed by the model, i.e., the regression
| variation, subtract the fraction sRes/sTot from 1.  This is the value R^2.
1 - sRes/sTot
## [1] 0.2104629
| You are doing so well!

  |=================================================================              |  82%
| For fun you can compare your result to the values shown in summary(fit)$r.squared to
| see if it looks familiar. Do this now.
summary(fit)$r.squared
## [1] 0.2104629
| Perseverance, that's the answer.

  |====================================================================           |  86%
| To see some real magic, compute the square of the correlation of the galton data, the
| children and parents. Use the R function cor.
cor(galton$parent,galton$child)^2
## [1] 0.2104629
| Excellent work!

  |========================================================================       |  91%
| We'll now summarize useful facts about R^2. It is the percentage of variation
| explained by the regression model. As a percentage it is between 0 and 1. It also
| equals the sample correlation squared. However, R^2 doesn't tell the whole story.

...

  |===========================================================================    |  95%
| Congrats! You've finished this lesson on Residual Variation.

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: No

Selection: 2

| That's the answer I was looking for.

| You've reached the end of this lesson! Returning to the main menu...

Lesson 4 の終了です。

3.6 Lesson 5: Introduction to Multivariable Regression

重回帰モデル(Multivariable Regression)は、単回帰モデルに帰着できることを学びます。


| Would you like to continue with one of these lessons?

1: R Programming Basic Building Blocks
2: The R Programming Environment Data Manipulation
3: No. Let me start something new.

Selection: 3

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Regression Models
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 2

| Please choose a lesson, or type 0 to return to course menu.

 1: Introduction
 2: Residuals
 3: Least Squares Estimation
 4: Residual Variation
 5: Introduction to Multivariable Regression
 6: MultiVar Examples
 7: MultiVar Examples2
 8: MultiVar Examples3
 9: Residuals Diagnostics and Variation
10: Variance Inflation Factors
11: Overfitting and Underfitting
12: Binary Outcomes
13: Count Outcomes

Selection: 5
  |                                                                               |   0%

| Introduction to Multivariable Regression. (Slides for this and other Data Science
| courses may be found at github https://github.com/DataScienceSpecialization/courses.
| If you care to use them, they must be downloaded as a zip file and viewed locally.
| This lesson corresponds to Regression_Models/02_01_multivariate. Galton data is from
| John Verzani's Using R website, http://wiener.math.csi.cuny.edu/UsingR/)

...

  |===                                                                            |   4%
| In this lesson we'll illustrate that regression in many variables amounts to a series
| of regressions in one. Using regression in one variable, we'll show how to eliminate
| any chosen regressor, thus reducing a regression in N variables, to a regression in
| N-1. Hence, if we know how to do a regression in 1 variable, we can do a regression in
| 2. Once we know how to do a regression in 2 variables, we can do a regression in 3,
| and so on. We begin with the galton data and a review of eliminating the intercept by
| subtracting the means.

...

  |======                                                                         |   8%
| When we perform a regression in one variable, such as lm(child ~ parent, galton), we
| get two coefficients, a slope and an intercept. The intercept is really the
| coefficient of a special regressor which has the same value, 1, at every sample. The
| function, lm, includes this regressor by default.

...

  |=========                                                                      |  12%
| We'll demonstrate by substituting an all-ones regressor of our own. This regressor
| must have the same number of samples as galton (928.) Create such an object and name
| it ones, using ones <- rep(1, nrow(galton)), or some equivalent expression.

rep() は、repeat function です。rep(1, nrow(galton)) は、1 を galton データの、行(observation の数)だけ、繰り返します。

ones <- rep(1, nrow(galton))
| Nice work!

  |============                                                                   |  15%
| The galton data has already been loaded. The default intercept can be excluded by
| using -1 in the formula. Perform a regression which substitutes our regressor, ones,
| for the default using lm(child ~ ones + parent -1, galton). Since we want the result
| to print, don't assign it to a variable.
lm(child ~ ones + parent -1, galton)
## 
## Call:
## lm(formula = child ~ ones + parent - 1, data = galton)
## 
## Coefficients:
##    ones   parent  
## 23.9415   0.6463
| Perseverance, that's the answer.

  |===============                                                                |  19%
| The coefficient of ones is 23.9415. Now use the default, lm(child ~ parent, galton),
| to show the intercept has the same value. This time, DO NOT suppress the intercept
| with -1.
lm(child ~ parent, galton)
## 
## Call:
## lm(formula = child ~ parent, data = galton)
## 
## Coefficients:
## (Intercept)       parent  
##     23.9415       0.6463
| You nailed it! Good job!

  |==================                                                             |  23%
| The regression in one variable given by lm(child ~ parent, galton) really involves two
| regressors, the variable, parent, and a regressor of all ones.

1: False
2: True

Selection: 2

| Perseverance, that's the answer.

  |=====================                                                          |  27%
| In earlier lessons we demonstrated that the regression line given by lm(child ~
| parent, galton) goes through the point x=mean(parent), y=mean(child). We also showed
| that if we subtract the mean from each variable, the regression line goes through the
| origin, x=0, y=0, hence its intercept is zero. Thus, by subtracting the means, we
| eliminate one of the two regressors, the constant, leaving just one, parent. The
| coefficient of the remaining regressor is the slope.

...

  |========================                                                       |  31%
| Subtracting the means to eliminate the intercept is a special case of a general
| technique which is sometimes called Gaussian Elimination. As it applies here, the
| general technique is to pick one regressor and to replace all other variables by the
| residuals of their regressions against that one.

...

  |===========================                                                    |  35%
| Suppose, as claimed, that subtracting a variable's mean is a special case of replacing
| the variable with a residual. In this special case, it would be the residual of a
| regression against what?

1: The outcome
2: The constant, 1
3: The variable itself

Selection: 2

| Your dedication is inspiring!

  |==============================                                                 |  38%
| The mean of a variable is the coefficient of its regression against the constant, 1.
| Thus, subtracting the mean is equivalent to replacing a variable by the residual of
| its regression against 1. In an R formula, the constant regressor can be represented
| by a 1 on the right hand side. Thus, the expression, lm(child ~ 1, galton), regresses
| child against the constant, 1. Recall that in the galton data, the mean height of a
| child was 68.09 inches. Use lm(child ~ 1, galton) to compare the resulting coefficient
| (the intercept) and the mean height of 68.09. Since we want the result to print, don't
| assign it a name.
lm(child ~ 1, galton)
## 
## Call:
## lm(formula = child ~ 1, data = galton)
## 
## Coefficients:
## (Intercept)  
##       68.09
| All that practice is paying off!

  |=================================                                              |  42%
| The mean of a variable is equal to its regression against the constant, 1.

1: False
2: True

Selection: 2

| That's the answer I was looking for.

  |====================================                                           |  46%
| To illustrate the general case we'll use the trees data from the datasets package. The
| idea is to predict the Volume of timber which a tree might produce from measurements
| of its Height and Girth. To avoid treating the intercept as a special case, we have
| added a column of 1's to the data which we shall use in its place. Please take a
| moment to inspect the data using either View(trees) or head(trees).

trees データセットは、Built-in で、最初から、R に含まれています。そこで、libray(datasets) としなくても、読み込まれます。

head(trees)
| That's correct!

  |========================================                                       |  50%
| A file of relevant code has been copied to your working directory and sourced. The
| file, elimination.R, should have appeared in your editor. If not, please open it
| manually.

...

elimination.R の内容

# Regress the given variable on the given predictor,
# suppressing the intercept, and return the residual.
regressOneOnOne <- function(predictor, other, dataframe){
  # Point A. Create a formula such as Girth ~ Height -1
  formula <- paste0(other, " ~ ", predictor, " - 1")
  # Use the formula in a regression and return the residual.
  resid(lm(formula, dataframe))
}

# Eliminate the specified predictor from the dataframe by
# regressing all other variables on that predictor
# and returning a data frame containing the residuals
# of those regressions.
eliminate <- function(predictor, dataframe){
  # Find the names of all columns except the predictor.
  others <- setdiff(names(dataframe), predictor)
  # Calculate the residuals of each when regressed against the given predictor
  temp <- sapply(others, function(other)regressOneOnOne(predictor, other, dataframe))
  # sapply returns a matrix of residuals; convert to a data frame and return.
  as.data.frame(temp)
}
  |===========================================                                    |  54%
| The general technique is to pick one predictor and to replace all other variables by
| the residuals of their regressions against that one. The function, regressOneOnOne, in
| eliminate.R performs the first step of this process. Given the name of a predictor and
| one other variable, other, it returns the residual of other when regressed against
| predictor. In its first line, labeled Point A, it creates a formula. Suppose that
| predictor were 'Girth' and other were 'Volume'. What formula would it create?

1: Volume ~ Girth
2: Volume ~ Girth - 1
3: Girth ~ Volume - 1

Selection: 2

| That's a job well done!

  |==============================================                                 |  58%
| The remaining function, eliminate, applies regressOneOnOne to all variables except a
| given predictor and collects the residuals in a data frame. We'll first show that when
| we eliminate one regressor from the data, a regression on the remaining will produce
| their correct coefficients. (Of course, the coefficient of the eliminated regressor
| will be missing, but more about that later.)

...

  |=================================================                              |  62%
| For reference, create a model named fit, based on all three regressors, Girth, Height,
| and Constant, and assign the result to a variable named fit. Use an expression such as
| fit <- lm(Volume ~ Girth + Height + Constant -1, trees). Don't forget the -1, and be
| sure to name the model fit for later use.
fit <- lm(Volume ~ Girth + Height + Constant -1, trees)
| You are quite good my friend!

  |====================================================                           |  65%
| Now let's eliminate Girth from the data set. Call the reduced data set trees2 to
| indicate it has only 2 regressors. Use the expression trees2 <- eliminate("Girth",
| trees).
trees2 <- eliminate("Girth", trees)
| Excellent work!

  |=======================================================                        |  69%
| Use head(trees2) or View(trees2) to inspect the reduced data set.
head(trees2)
| Excellent work!

  |==========================================================                     |  73%
| Why, in trees2, is the Constant column not constant?

1: The constant, 1, has been replaced by its residual when regressed against Girth.
2: Computational precision was insufficient.
3: There must be some mistake

Selection: 1

| All that hard work is paying off!

  |=============================================================                  |  77%
| Now create a model, called fit2, using the reduced data set. Use an expression such as
| fit2 <- lm(Volume ~ Height + Constant -1, trees2). Don't forget to use -1 in the
| formula.
fit2 <- lm(Volume ~ Height + Constant -1, trees2)
| Keep working like that and you'll get there!

  |================================================================               |  81%
| Use the expression lapply(list(fit, fit2), coef) to print coefficients of fit and fit2
| for comparison.
lapply(list(fit, fit2), coef)
## [[1]]
##       Girth      Height    Constant 
##   4.7081605   0.3392512 -57.9876589 
## 
## [[2]]
##     Height   Constant 
## -0.4717493 -0.3955385
| You are really on a roll!

  |===================================================================            |  85%
| The coefficient of the eliminated variable is missing, of course. One way to get it
| would be to go back to the original data, trees, eliminate a different regressor, such
| as Height, and do another 2 variable regession, as above. There are much more
| efficient ways, but efficiency is not the point of this demonstration. We have shown
| how to reduce a regression in 3 variables to a regression in 2. We can go further and
| eliminate another variable, reducing a regression in 2 variables to a regression in 1.

...

  |======================================================================         |  88%
| Here is the final step. We have used eliminate("Height", trees2) to reduce the data to
| the outcome, Volume, and the Constant regressor. We have regressed Volume on Constant,
| and printed the coefficient as shown in the command above the answer. As you can see,
| the coefficient of Constant agrees with previous values.


Call:
lm(formula = Volume ~ Constant - 1, data = eliminate("Height", 
    trees2))

Coefficients:
Constant  
  -57.99  

...

  |=========================================================================      |  92%
| Suppose we were given a multivariable regression problem involving an outcome and N
| regressors, where N > 1. Using only single-variable regression, how can the problem be
| reduced to a problem with only N-1 regressors?

1: Pick any regressor and replace the outcome and all other regressors by their residuals against the chosen one.
2: Subtract the mean from the outcome and each regressor.

Selection: 1

| Your dedication is inspiring!

  |============================================================================   |  96%
| We have illustrated that regression in many variables amounts to a series of
| regressions in one. The actual algorithms used by functions such as lm are more
| efficient, but are computationally equivalent to what we have done. That is, the
| algorithms use equivalent steps but combine them more efficiently and abstractly. This
| completes the lesson.

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: No

Selection: 2

| You are really on a roll!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 5 の終了です。

3.7 Lesson 6: MultiVar Examples

| Would you like to continue with one of these lessons?

1: R Programming Basic Building Blocks
2: The R Programming Environment Data Manipulation
3: No. Let me start something new.

Selection: 3

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Regression Models
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 2

| Please choose a lesson, or type 0 to return to course menu.

 1: Introduction
 2: Residuals
 3: Least Squares Estimation
 4: Residual Variation
 5: Introduction to Multivariable Regression
 6: MultiVar Examples
 7: MultiVar Examples2
 8: MultiVar Examples3
 9: Residuals Diagnostics and Variation
10: Variance Inflation Factors
11: Overfitting and Underfitting
12: Binary Outcomes
13: Count Outcomes

Selection: 6

| Attempting to load lesson dependencies...

| Package ‘datasets’ loaded correctly!

| Package ‘stats’ loaded correctly!

| Package ‘graphics’ loaded correctly!

  |                                                                               |   0%

| MultiVar_Examples. (Slides for this and other Data Science courses may be found at
| github https://github.com/DataScienceSpecialization/courses. If you care to use them,
| they must be downloaded as a zip file and viewed locally. This lesson corresponds to
| Regression_Models/02_02_multivariateExamples.)

...

  |===                                                                            |   4%
| In this lesson, we'll look at some examples of regression models with more than one
| variable. We'll begin with the Swiss data which we've taken the liberty to load for
| you. This data is part of R's datasets package. It was gathered in 1888, a time of
| demographic change in Switzerland, and measured six quantities in 47 French-speaking
| provinces of Switzerland. We used the code from the slides (the R function pairs) to
| display here a 6 by 6 array of scatterplots showing pairwise relationships between the
| variables. All of the variables, except for fertility, are proportions of population.
| For example, "Examination" shows the percentage of draftees receiving the highest mark
| on an army exam, and "Education" the percentage of draftees with education beyond
| primary school.

...

  |=======                                                                        |   9%
| From the plot, which is NOT one of the factors measured?

1: Infant Mortality
2: Obesity
3: Catholic
4: Fertility

Selection: 2

| Excellent job!

  |==========                                                                     |  13%
| First, use the R function lm to generate the linear model "all" in which Fertility is
| the variable dependent on all the others. Use the R shorthand "." to represent the
| five independent variables in the formula passed to lm.  Remember the data is "swiss".
all <- lm(Fertility ~ . - Fertility, data = swiss)
| That's correct!

  |==============                                                                 |  17%
| Now look at the summary of the linear model all.
summary(all)
## 
## Call:
## lm(formula = Fertility ~ . - Fertility, data = swiss)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.2743  -5.2617   0.5032   4.1198  15.3213 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      66.91518   10.70604   6.250 1.91e-07 ***
## Agriculture      -0.17211    0.07030  -2.448  0.01873 *  
## Examination      -0.25801    0.25388  -1.016  0.31546    
## Education        -0.87094    0.18303  -4.758 2.43e-05 ***
## Catholic          0.10412    0.03526   2.953  0.00519 ** 
## Infant.Mortality  1.07705    0.38172   2.822  0.00734 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.165 on 41 degrees of freedom
## Multiple R-squared:  0.7067, Adjusted R-squared:  0.671 
## F-statistic: 19.76 on 5 and 41 DF,  p-value: 5.594e-10
| Excellent work!

  |=================                                                              |  22%
| Recall that the Estimates are the coefficients of the independent variables of the
| linear model (all of which are percentages) and they reflect an estimated change in
| the dependent variable (fertility) when the corresponding independent variable
| changes. So, for every 1% increase in percent of males involved in agriculture as an
| occupation we expect a .17 decrease in fertility, holding all the other variables
| constant; for every 1% increase in Catholicism, we expect a .10 increase in fertility,
| holding all other variables constant.

...

  |=====================                                                          |  26%
| The "*" at the far end of the row indicates that the influence of Agriculture on
| Fertility is significant. At what alpha level is the t-test of Agriculture
| significant?

1: 0.01
2: 0.05
3: 0.1
4: R doesn't say

Selection: 2

| You are amazing!

  |========================                                                       |  30%
| Now generate the summary of another linear model (don't store it in a new variable) in
| which Fertility depends only on agriculture.
summary(lm(Fertility ~ Agriculture, data = swiss))
## 
## Call:
## lm(formula = Fertility ~ Agriculture, data = swiss)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.5374  -7.8685  -0.6362   9.0464  24.4858 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 60.30438    4.25126  14.185   <2e-16 ***
## Agriculture  0.19420    0.07671   2.532   0.0149 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.82 on 45 degrees of freedom
## Multiple R-squared:  0.1247, Adjusted R-squared:  0.1052 
## F-statistic: 6.409 on 1 and 45 DF,  p-value: 0.01492
| Keep up the great work!

  |===========================                                                    |  35%
| What is the coefficient of agriculture in this new model?

1: 0.07671
2: 60.30438
3: 0.19420
4: *

Selection: 3

| You are quite good my friend!

  |===============================                                                |  39%
| The interesting point is that the sign of the Agriculture coefficient changed from
| negative (when all the variables were included in the model) to positive (when the
| model only considered Agriculture). Obviously the presence of the other factors
| affects the influence Agriculture has on Fertility.

...

  |==================================                                             |  43%
| Let's consider the relationship between some of the factors. How would you expect
| level Education and performance on an Examination to be related?

1: They would be correlated
2: They would be uncorrelated
3: I would not be able to guess without more information

Selection: 1

| Great job!

  |======================================                                         |  48%
| Now check your intuition with the R command "cor". This computes the correlation
| between Examination and Education.
cor(swiss$Examination, swiss$Education)
## [1] 0.6984153
| Excellent work!

  |=========================================                                      |  52%
| The correlation of .6984 shows the two are correlated. Now find the correlation
| between Agriculture and Education.
cor(swiss$Agriculture, swiss$Education)
## [1] -0.6395225
| Nice work!

  |=============================================                                  |  57%
| The negative correlation (-.6395) between Agriculture and Education might be affecting
| Agriculture's influence on Fertility. I've loaded and sourced the file swissLMs.R in
| your working directory. In it is a function makelms() which generates a sequence of
| five linear models. Each model has one more independent variable than the preceding
| model, so the first has just one independent variable, Agriculture, and the last has
| all 5. I've tried loading the source code in your editor. If I haven't done this, open
| the file manually so you can look at the code.

...

swissLMs.R の内容

makelms <- function(){
  # Store the coefficient of linear models with different independent variables
  cf <- c(coef(lm(Fertility ~ Agriculture, swiss))[2], 
          coef(lm(Fertility ~ Agriculture + Catholic,swiss))[2],
          coef(lm(Fertility ~ Agriculture + Catholic + Education,swiss))[2],
          coef(lm(Fertility ~ Agriculture + Catholic + Education + Examination,swiss))[2],
          coef(lm(Fertility ~ Agriculture + Catholic + Education + Examination +Infant.Mortality, swiss))[2])
  print(cf)
}

# Regressor generation process 1.
rgp1 <- function(){
  print("Processing. Please wait.")
  # number of samples per simulation
  n <- 100
  # number of simulations
  nosim <- 1000
  # set seed for reproducability
  set.seed(4321)
  # Point A:
  x1 <- rnorm(n)
  x2 <- rnorm(n)
  x3 <- rnorm(n)
  # Point B:
  betas <- sapply(1 : nosim, function(i)makelms(x1, x2, x3))
  round(apply(betas, 1, var), 5)
}

# Regressor generation process 2.
rgp2 <- function(){
  print("Processing. Please wait.")
  # number of samples per simulation
  n <- 100
  # number of simulations
  nosim <- 1000
  # set seed for reproducability
  set.seed(4321)
  # Point C:
  x1 <- rnorm(n)
  x2 <- x1/sqrt(2) + rnorm(n) /sqrt(2)
  x3 <- x1 * 0.95 + rnorm(n) * sqrt(1 - 0.95^2)
  # Point D:
  betas <- sapply(1 : nosim, function(i)makelms(x1, x2, x3))
  round(apply(betas, 1, var), 5)
}
  |================================================                               |  61%
| Now run the function makelms() to see how the addition of variables affects the
| coefficient of Agriculture in the models.
makelms()
## Agriculture Agriculture Agriculture Agriculture Agriculture 
##   0.1942017   0.1095281  -0.2030377  -0.2206455  -0.1721140
| That's a job well done!

  |====================================================                           |  65%
| The addition of which variable changes the sign of Agriculture's coefficient from
| positive to negative?

1: Catholic
2: Education
3: Examination
4: Infant.Mortality

Selection: 2

| That's a job well done!

  |=======================================================                        |  70%
| Now we'll show what happens when we add a variable that provides no new linear
| information to a model. Create a variable ec that is the sum of swiss$Examination and
| swiss$Catholic.
ec <- swiss$Examination + swiss$Catholic
| Your dedication is inspiring!

  |==========================================================                     |  74%
| Now generate a new model efit with Fertility as the dependent variable and the
| remaining 5 of the original variables AND ec as the independent variables. Use the R
| shorthand ". + ec" for the righthand side of the formula.
efit <- lm(Fertility ~ . - Fertility + ec, data = swiss)
| You're the best!

  |==============================================================                 |  78%
| We'll see that R ignores this new term since it doesn't add any information to the
| model.

...

  |=================================================================              |  83%
| Subtract the efit coefficients from the coefficients of the first model you created,
| all.
all$coefficients - efit$coefficients
##      (Intercept)      Agriculture      Examination        Education 
##                0                0                0                0 
##         Catholic Infant.Mortality               ec 
##                0                0               NA
 | Keep up the great work!

  |=====================================================================          |  87%
| Which is the coefficient of ec?

1: NA
2: 0
3: I haven't a clue.

Selection: 1

| You are amazing!

  |========================================================================       |  91%
| This tells us that

1: R is really cool
2: Adding ec doesn't change the model
3: Adding ec zeroes out the coefficients

Selection: 2

| You are amazing!

  |============================================================================   |  96%
| Congrats! You've concluded this first lesson on multivariable linear models.

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

| You got it!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 6 の終了です。

3.8 Lesson 7: MultiVar Examples2


| Would you like to continue with one of these lessons?

1: R Programming Basic Building Blocks
2: The R Programming Environment Data Manipulation
3: No. Let me start something new.

Selection: 3

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Regression Models
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 2

| Please choose a lesson, or type 0 to return to course menu.

 1: Introduction
 2: Residuals
 3: Least Squares Estimation
 4: Residual Variation
 5: Introduction to Multivariable Regression
 6: MultiVar Examples
 7: MultiVar Examples2
 8: MultiVar Examples3
 9: Residuals Diagnostics and Variation
10: Variance Inflation Factors
11: Overfitting and Underfitting
12: Binary Outcomes
13: Count Outcomes

Selection: 7

| Attempting to load lesson dependencies...

| Package ‘datasets’ loaded correctly!

| Package ‘stats’ loaded correctly!

| Package ‘graphics’ loaded correctly!

  |                                                                               |   0%

| MultiVar_Examples2. (Slides for this and other Data Science courses may be found at
| github https://github.com/DataScienceSpecialization/courses. If you care to use them,
| they must be downloaded as a zip file and viewed locally. This lesson corresponds to
| Regression_Models/02_02_multivariateExamples.)

...

  |==                                                                             |   3%
| This is the second lesson in which we'll look at some regression models with more than
| one independent variable. We'll begin with the InsectSprays data which we've taken the
| liberty to load for you. This data is part of R's datasets package. It shows the
| effectiveness of different insect sprays. We've used the code from the slides to show
| you a boxplot of the data.

...

  |=====                                                                          |   6%
| How many Insect Sprays are in this dataset?
str(InsectSprays)
## 'data.frame':    72 obs. of  2 variables:
##  $ count: num  10 7 20 14 14 12 10 23 17 20 ...
##  $ spray: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
| That's the answer I was looking for.

  |=======                                                                        |   9%
| From the boxplot, which spray has the largest median?

ANSWER: 6

| You almost had it, but not quite. Try again.

| The median of each spray is indicated by the thick horizontal line in each box. Which
| spray has its median furthest from 0?

ANSWER: B

| That's correct!

  |==========                                                                     |  12%
| Let's first try to get a better understanding of the dataset InsectSprays. Use the R
| function dim to find the dimensions of the data.
dim(InsectSprays)
## [1] 72  2
| All that hard work is paying off!

  |============                                                                   |  15%
| The R function dim says that InsectSprays is a 72 by 2 array. Use the R function head
| to look at the first 15 elements of InsectSprays.
head(InsectSprays,15)
| You are amazing!

  |==============                                                                 |  18%
| So this dataset contains 72 counts, each associated with a particular different spray.
| The counts are in the first column and a letter identifying the spray in the second.
| To save you some typing we've created 6 arrays with just the count data for each
| spray. The arrays have the names sx, where x is A,B,C,D,E or F. Type one of the names
| (your choice) of these arrays to see what we're talking about.

> sA
 [1] 10  7 20 14 14 12 10 23 17 20 14 13

| You're the best!

  |=================                                                              |  21%
| As a check, run the R command summary on the second column of the dataset to see how
| many entries we have for each spray.  (Recall that the expression M[ ,2] yields the
| second column of the array M.)
summary(InsectSprays[,2])
| All that hard work is paying off!

  |===================                                                            |  24%
| It's not surprising that with 72 counts we'd have 12 count for each of the 6 sprays.
| In this lesson we'll consider multilevel factor levels and how we interpret linear
| models of data with more than 2 factors.

...

  |======================                                                         |  27%
| Use the R function sapply to find out the classes of the columns of the data.
sapply(InsectSprays,class)
##     count     spray 
## "numeric"  "factor"
| You are really on a roll!

  |========================                                                       |  30%
| The class of the second "spray" column is factor. Recall from the slides that the
| equation representing the relationship between a particular outcome and several
| factors contains binary variables, one for each factor. This data has 6 factors so we
| need 6 dummy variables. Each will indicate if a particular outcome (a count) is
| associated with a specific factor or category (insect spray).

...

  |==========================                                                     |  33%
| Using R's lm function, generate the linear model in which count is the dependent
| variable and spray is the independent. Recall that in R formula has the form y ~ x,
| where y depends on the predictor x. The data set is InsectSprays. Store the model in
| the variable fit.
fit <- lm(count ~ spray, data = InsectSprays)
| Perseverance, that's the answer.

  |=============================                                                  |  36%
| Using R's summary function, look at the coefficients of the model. Recall that these
| can be accessed with the R construct x$coef.
summary(fit)$coef
##                Estimate Std. Error    t value     Pr(>|t|)
## (Intercept)  14.5000000   1.132156 12.8074279 1.470512e-19
## sprayB        0.8333333   1.601110  0.5204724 6.044761e-01
## sprayC      -12.4166667   1.601110 -7.7550382 7.266893e-11
## sprayD       -9.5833333   1.601110 -5.9854322 9.816910e-08
## sprayE      -11.0000000   1.601110 -6.8702352 2.753922e-09
## sprayF        2.1666667   1.601110  1.3532281 1.805998e-01
| You are quite good my friend!

  |===============================                                                |  39%
| Notice that R returns a 6 by 4 array. For convenience, store off the first column of
| this array, the Estimate column, in a variable called est. Remember the R construct
| for accessing the first column is x[,1].
est <- summary(fit)$coef[,1]
| You're the best!

  |==================================                                             |  42%
| Notice that sprayA does not appear explicitly in the list of Estimates. It is there,
| however, as the first entry in the Estimate column. It is labeled as "(Intercept)".
| That is because sprayA is the first in the alphabetical list of the levels of the
| factor, and R by default uses the first level as the reference against which the other
| levels or groups are compared when doing its t-tests (shown in the third column).

...

  |====================================                                           |  45%
| What do the Estimates of this model represent? Of course they are the coefficients of
| the binary or dummy variables associated with sprays. More importantly, the Intercept
| is the mean of the reference group, in this case sprayA, and the other Estimates are
| the distances of the other groups' means from the reference mean. Let's verify these
| claims now. First compute the mean of the sprayA counts. Remember the counts are all
| stored in the vectors named sx. Now we're interested in finding the mean of sA.

> mean(sA)
[1] 14.5

| That's the answer I was looking for.

  |======================================                                         |  48%
| What do you think the mean of sprayB is?

1: 15.3333
2: I haven't a clue
3: 0.83333
4: -12.41667

Selection: 3

| You almost had it, but not quite. Try again.

| Adding the value of the Intercept to the Estimate for sprayB yields the empirical mean
| of sprayB.

1: -12.41667
2: 0.83333
3: 15.3333
4: I haven't a clue

Selection: 3

| All that practice is paying off!

  |=========================================                                      |  52%
| Verify this now by using R's mean function to compute the mean of sprayB.

> mean(sB)
[1] 15.33333

| You got it!

  |===========================================                                    |  55%
| Let's generate another model of this data, this time omitting the intercept. We can
| easily use R's lm function to do this by appending " - 1" to the formula, e.g., count
| ~ spray - 1. This tells R to omit the first level. Do this now and store the new model
| in the variable nfit.
nfit <- lm(count ~ spray - 1, data = InsectSprays)
| You got it right!

  |=============================================                                  |  58%
| Now, as before, look at the coefficient portion of the summary of nfit.
summary(nfit)$coef
| You are quite good my friend!

  |================================================                               |  61%
| Notice that sprayA now appears explicitly in the list of Estimates. Also notice how
| the values of the columns have changed. The means of all the groups are now explicitly
| shown in the Estimate column. Remember that previously, with an intercept, sprayA was
| excluded, its mean was the intercept, and the values for the other sprays (estimates,
| standard errors, and t-tests) were all computed relative to sprayA, the reference
| group. Omitting the intercept clearly affected the model.

...

  |==================================================                             |  64%
| What values does the Estimate column now show?

1: The variances of all 6 levels
2: I have no idea
3: The means of all 6 levels

Selection: 3

| All that hard work is paying off!

  |=====================================================                          |  67%
| Without an intercept (reference group) the tests are whether the expected counts (the
| groups means) are different from zero. Which spray has the least significant result?

1: sprayA
2: sprayB
3: sprayC
4: sprayF

Selection: 3

| You are doing so well!

  |=======================================================                        |  70%
| Clearly, which level is first is important to the model. If you wanted a different
| reference group, for instance, to compare sprayB to sprayC, you could refit the model
| with a different reference group.

...

  |=========================================================                      |  73%
| The R function relevel does precisely this. It re-orders the levels of a factor. We'll
| do this now. We'll call relevel with two arguments. The first is the factor, in this
| case InsectSprays$spray, and the second is the level that we want to be first, in this
| case "C". Store the result in a new variable spray2.
spray2 <- relevel(InsectSprays$spray, ref = "C")
| That's correct!

  |============================================================                   |  76%
| Now generate a new linear model and put the result in the variable fit2.
fit2 <- lm(count ~ spray2, data = InsectSprays)
| You are doing so well!

  |==============================================================                 |  79%
| As before, look at the coef portion of the summary of this new model fit2. See how
| sprayC is now the intercept (since it doesn't appear explicitly in the list).
summary(fit2)$coef
##              Estimate Std. Error  t value     Pr(>|t|)
## (Intercept)  2.083333   1.132156 1.840148 7.024334e-02
## spray2A     12.416667   1.601110 7.755038 7.266893e-11
## spray2B     13.250000   1.601110 8.275511 8.509776e-12
## spray2D      2.833333   1.601110 1.769606 8.141205e-02
## spray2E      1.416667   1.601110 0.884803 3.794750e-01
## spray2F     14.583333   1.601110 9.108266 2.794343e-13
| All that hard work is paying off!

  |=================================================================              |  82%
| According to this new model what is the mean of spray2C?

1: The model doesn't tell me.
2: 12.416667
3: 2.083333
4: 14.583333

Selection: 3

| Keep up the great work!

  |===================================================================            |  85%
| Verify your answer with R's mean function using the array sC as the argument.

> mean(sC)
[1] 2.083333

| Excellent job!

  |=====================================================================          |  88%
| According to this new model what is the mean of spray2A?

1: 14.583333
2: 12.416667
3: 14.50000
4: I don't have a clue

Selection: 3

| You are quite good my friend!

  |========================================================================       |  91%
| Remember that with this model sprayC is the reference group, so the t-test statistics
| (shown in column 3 of the summary coefficients) compare the other sprays to sprayC.
| These can be computed by hand using the Estimates and standard error from the original
| model (fit) which used sprayA as the references.

...

  |==========================================================================     |  94%
| The slides show the details of this but here we'll demonstrate by calculating the
| spray2B t value.  Subtract fit's sprayC coefficient (fit$coef[3]) from sprayB's
| (fit$coef[2]) and divide by the standard error which we saw was 1.6011. The result is
| spray2B's t value. Do this now.
(fit$coef[2]-fit$coef[3])/1.6011
##   sprayB 
## 8.275561
| You got it right!

  |=============================================================================  |  97%
| We glossed over some details in this lesson. For instance, counts can never be 0 so
| the assumption of normality is violated. We'll explore this issue more when we discuss
| Poisson GLMs. For now be glad that you've concluded this second lesson on
| multivariable linear models.

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: No

Selection: 2

| You nailed it! Good job!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 7 の終了です。

3.9 Lesson 8: MultiVar Examples3

この課では、国際連合(UN)の次のデータを活用します。

下の方法でも、データを入手できるが、dim があわないため、swirl が Install したデータを用いる。

url <- "http://apps.who.int/gho/athena/data/GHO/WHOSIS_000008.csv?profile=text&filter=COUNTRY:*;SEX:*"
hunger <- read.csv(url)
hunger <- hunger[hunger$Sex!="Both sexes",]
head(hunger)
dim(hunger)
## [1] 1117   12
names(hunger)
##  [1] "Indicator"      "Data.Source"    "PUBLISH.STATES" "Year"          
##  [5] "WHO.region"     "Country"        "Sex"            "Display.Value" 
##  [9] "Numeric"        "Low"            "High"           "Comments"
#download.file("http://apps.who.int/gho/athena/data/GHO/WHOSIS_000008.csv?profile=text&filter=COUNTRY:*;SEX:*","hunger.csv",method="curl")
hunger <- read.csv("hunger.csv")
hunger <- hunger[hunger$Sex!="Both sexes",]
head(hunger)

| Would you like to continue with one of these lessons?

1: R Programming Basic Building Blocks
2: The R Programming Environment Data Manipulation
3: No. Let me start something new.

Selection: 3

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Regression Models
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 2

| Please choose a lesson, or type 0 to return to course menu.

 1: Introduction
 2: Residuals
 3: Least Squares Estimation
 4: Residual Variation
 5: Introduction to Multivariable Regression
 6: MultiVar Examples
 7: MultiVar Examples2
 8: MultiVar Examples3
 9: Residuals Diagnostics and Variation
10: Variance Inflation Factors
11: Overfitting and Underfitting
12: Binary Outcomes
13: Count Outcomes

Selection: 8

| Attempting to load lesson dependencies...

| Package ‘datasets’ loaded correctly!

| Package ‘stats’ loaded correctly!

| Package ‘graphics’ loaded correctly!

  |                                                                               |   0%

| MultiVar_Examples3. (Slides for this and other Data Science courses may be found at
| github https://github.com/DataScienceSpecialization/courses. If you care to use them,
| they must be downloaded as a zip file and viewed locally. This lesson corresponds to
| Regression_Models/02_02_multivariateExamples.)

...

  |==                                                                             |   3%
| This is the third and final lesson in which we'll look at regression models with more
| than one independent variable or predictor. We'll begin with WHO hunger data which
| we've taken the liberty to load for you. WHO is the World Health Organization and this
| data concerns young children from around the world and rates of hunger among them
| which the organization compiled over a number of years. The original csv file was very
| large and we've subsetted just the rows which identify the gender of the child as
| either male or female. We've read the data into the data frame "hunger" for you, so
| you can easily access it.

...

  |====                                                                           |   5%
| As we did in the last lesson let's first try to get a better understanding of the
| dataset. Use the R function dim to find the dimensions of hunger.
dim(hunger)
## [1] 1117   12
[1] 948  13

| That's correct!

  |======                                                                         |   8%
| How many samples does hunger have?

> 948
[1] 948

| You are quite good my friend!

  |=========                                                                      |  11%
| Now use the R function names to find out what the 13 columns of hunger represent.

> names(hunger)
 [1] "X"              "Indicator"      "Data.Source"    "PUBLISH.STATES" "Year"          
 [6] "WHO.region"     "Country"        "Sex"            "Display.Value"  "Numeric"       
[11] "Low"            "High"           "Comments"      

| Keep up the great work!

  |===========                                                                    |  14%
| The Numeric column for a particular row tells us the percentage of children under age
| 5 who were underweight when that sample was taken. This is one of the columns we'll be
| focussing on in this lesson. It will be the outcome (dependent variable) for the
| models we generate.

...

  |=============                                                                  |  16%
| Let's first look at the rate of hunger and see how it's changed over time. Use the R
| function lm to generate the linear model in which the rate of hunger, Numeric, depends
| on the predictor, Year. Put the result in the variable fit.

> fit <-  lm(Numeric ~ Year, data = hunger)

| All that hard work is paying off!

  |===============                                                                |  19%
| Now look at the coef portion of the summary of fit.

> summary(fit)$coef
              Estimate  Std. Error   t value     Pr(>|t|)
(Intercept) 634.479660 121.1445995  5.237375 2.007699e-07
Year         -0.308397   0.0605292 -5.095012 4.209412e-07

| That's the answer I was looking for.

  |=================                                                              |  22%
| What is the coefficient of hunger$Year?

1: -0.30840
2: 0.06053
3: 121.14460
4: 634.47966

Selection: 1

| Excellent work!

  |===================                                                            |  24%
| What does the negative Estimate of hunger$Year show?

1: As time goes on, the rate of hunger decreases
2: As time goes on, the rate of hunger increases
3: I haven't a clue

Selection: 1

| All that practice is paying off!

  |=====================                                                          |  27%
| What does the intercept of the model represent?

1: the number of children questioned in the survey
2: the percentage of hungry children at year 0
3: the number of hungry children at year 0

Selection: 3

| One more time. You can do it!

| Numeric gives a percentage of hungry children, and an intercept is the point at which
| a line intersects the axis. The axis represents a 0 value.

1: the number of children questioned in the survey
2: the percentage of hungry children at year 0
3: the number of hungry children at year 0

Selection: 2

| Excellent job!

  |=======================                                                        |  30%
| Now let's use R's subsetting capability to look at the rates of hunger for the
| different genders to see how, or even if, they differ.  Once again use the R function
| lm to generate the linear model in which the rate of hunger (Numeric) for female
| children depends on Year. Put the result in the variable lmF. You'll have to use the R
| construct x[hunger$Sex=="Female"] to pick out both the correct Numerics and the
| correct Years.

> lmF <- lm(Numeric[hunger$Sex=="Female"] ~ Year, data = hunger)
Error in model.frame.default(formula = Numeric[hunger$Sex == "Female"] ~  : 
  variable lengths differ (found for 'Year')
> lmF <- lm(Numeric ~ Year, data = hunger[hunger$Sex=="Female"])
Error in `[.data.frame`(hunger, hunger$Sex == "Female") : 
  undefined columns selected
> lmF <- lm(Numeric[hunger$Sex=="Female"] ~ Year[hunger$Sex=="Female"], data = hunger)

| All that practice is paying off!

  |==========================                                                     |  32%
| Do the same for male children and put the result in lmM.

> lmM <- lm(Numeric[hunger$Sex=="Male"] ~ Year[hunger$Sex=="Male"], data = hunger)

| Keep working like that and you'll get there!

  |============================                                                   |  35%
| Now we'll plot the data points and fitted lines using different colors to distinguish
| between males (blue) and females (pink).

...

  |==============================                                                 |  38%
| We can see from the plot that the lines are not exactly parallel. On the right side of
| the graph (around the year 2010) they are closer together than on the left side
| (around 1970). Since they aren't parallel, their slopes must be different, though both
| are negative. Of the following R expressions which would confirm that the slope for
| males is negative?

1: lmM$coef[1]
2: lmM$coef[2]
3: lmF$coef[2]

Selection: 2

| Nice work!

  |================================                                               |  41%
| Now instead of separating the data by subsetting the samples by gender we'll use
| gender as another predictor to create the linear model lmBoth. Recall that to do this
| in R we place a plus sign "+" between the independent variables, so the formula looks
| like dependent ~ independent1 + independent2.

...

  |==================================                                             |  43%
| Create lmBoth now. Numeric is the dependent, Year and Sex are the independent
| variables. The data is "hunger". For lmBoth, make sure Year is first and Sex is
| second.

> lmBoth <-  lm(Numeric ~ Year + Sex, data = hunger)

| You're the best!

  |====================================                                           |  46%
| Now look at the summary of lmBoth with the R command summary.

> summary(lmBoth)

Call:
lm(formula = Numeric ~ Year + Sex, data = hunger)

Residuals:
    Min      1Q  Median      3Q     Max 
-25.472 -11.297  -1.848   7.058  45.990 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 633.5283   120.8950   5.240 1.98e-07 ***
Year         -0.3084     0.0604  -5.106 3.99e-07 ***
SexMale       1.9027     0.8576   2.219   0.0267 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 13.2 on 945 degrees of freedom
Multiple R-squared:  0.03175,   Adjusted R-squared:  0.0297 
F-statistic: 15.49 on 2 and 945 DF,  p-value: 2.392e-07


| You are really on a roll!

  |======================================                                         |  49%
| Notice that three estimates are given, the intercept, one for Year and one for Male.
| What happened to the estimate for Female? Note that Male and Female are categorical
| variables hence they are factors in this model. Recall from the last lesson (and
| slides) that R treats the first (alphabetical) factor as the reference and its
| estimate is the intercept which represents the percentage of hungry females at year 0.
| The estimate given for the factor Male is a distance from the intercept (the estimate
| of the reference group Female). To calculate the percentage of hungry males at year 0
| you have to add together the intercept and the male estimate given by the model.

...

  |=========================================                                      |  51%
| What percentage of young Males were hungry at year 0?

1: 1.9027
2: 635.431
3: I can't tell since the data starts at 1970.
4: 633.2199

Selection: 2

| Perseverance, that's the answer.

  |===========================================                                    |  54%
| What does the estimate for hunger$Year represent?

1: the annual decrease in percentage of hungry children of both genders
2: the annual decrease in percentage of hungry females
3: the annual decrease in percentage of hungry males

Selection: 1

| Great job!

  |=============================================                                  |  57%
| Now we'll replot the data points along with two new lines using different colors.  The
| red line will have the female intercept and the blue line will have the male
| intercept.

...

  |===============================================                                |  59%
| The lines appear parallel. This is because

1: they have slopes that are very close
2: they have the same slope
3: I have no idea

Selection: 2

| Great job!

  |=================================================                              |  62%
| Now we'll consider the interaction between year and gender to see how that affects
| changes in rates of hunger. To do this we'll add a third term to the predictor portion
| of our model formula, the product of year and gender.

...

  |===================================================                            |  65%
| Create the model lmInter. Numeric is the outcome and the three predictors are Year,
| Sex, and Sex*Year. The data is "hunger".

> lmInter <-  lm(Numeric ~ Year + Sex + Sex * Year, data = hunger)

| That's the answer I was looking for.

  |=====================================================                          |  68%
| Now look at the summary of lmInter with the R command summary.

> summary(lmInter)

Call:
lm(formula = Numeric ~ Year + Sex + Sex * Year, data = hunger)

Residuals:
    Min      1Q  Median      3Q     Max 
-25.913 -11.248  -1.853   7.087  46.146 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  603.50580  171.05519   3.528 0.000439 ***
Year          -0.29340    0.08547  -3.433 0.000623 ***
SexMale       61.94772  241.90858   0.256 0.797946    
Year:SexMale  -0.03000    0.12087  -0.248 0.804022    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 13.21 on 944 degrees of freedom
Multiple R-squared:  0.03181,   Adjusted R-squared:  0.02874 
F-statistic: 10.34 on 3 and 944 DF,  p-value: 1.064e-06


| All that practice is paying off!

  |========================================================                       |  70%
| What is the percentage of hungry females at year 0?

1: 603.5058
2: The model doesn't say.
3: 61.94772

Selection: 1

| Great job!

  |==========================================================                     |  73%
| What is the percentage of hungry males at year 0?

1: The model doesn't say.
2: 665.4535
3: 603.5058
4: 61.94772

Selection: 2

| That's correct!

  |============================================================                   |  76%
| What is the annual change in percentage of hungry females?

1: -0.29340
2: 0.08547
3: The model doesn't say.
4: -0.03000

Selection: 1

| Perseverance, that's the answer.

  |==============================================================                 |  78%
| What is the annual change in percentage of hungry males?

1: -0.03000
2: -0.32340
3: 0.12087
4: The model doesn't say.

Selection: 2

| That's correct!

  |================================================================               |  81%
| Now we'll replot the data points along with two new lines using different colors to
| distinguish between the genders.

...

  |==================================================================             |  84%
| Which line has the steeper slope?

1: They look about the same
2: Male
3: Female

Selection: 2

| That's the answer I was looking for.

  |====================================================================           |  86%
| Finally, we note that things are a little trickier when we're dealing with an
| interaction between predictors which are continuous (and not factors). The slides show
| the underlying algebra, but we can summarize.

...

  |======================================================================         |  89%
| Suppose we have two interacting predictors and one of them is held constant. The
| expected change in the outcome for a unit change in the other predictor is the
| coefficient of that changing predictor + the coefficient of the interaction * the
| value of the predictor held constant.

...

  |=========================================================================      |  92%
| Suppose the linear model is Hi = b0 + (b1*Ii) + (b2*Yi)+ (b3*Ii*Yi) + ei. Here the H's
| represent the outcomes, the I's and Y's the predictors, neither of which is a
| category, and the b's represent the estimated coefficients of the predictors. We can
| ignore the e's which represent the residuals of the model. This equation models a
| continuous interaction since neither I nor Y is a category or factor. Suppose we fix I
| at some value and let Y vary.

...

  |===========================================================================    |  95%
| Which expression represents the change in H per unit change in Y given that I is fixed
| at 5?

1: b0+b2
2: b2+b3*Y
3: b2+b3*5
4: b1+5*b3

Selection: 3

| Excellent job!

  |=============================================================================  |  97%
| Congratulations! You've finished this final lesson in multivariable regression models.

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

| You are quite good my friend!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 8 の終了です。

3.10 Lesson 9: Residuals Diagnostics and Variation

外れ値(outlier)が、線形回帰に大きな影響をもたらす場合と、ほとんど影響しない場合について学ぶ。

  • influence.measures {stats}: Regression Deletion Diagnostics
    • This suite of functions can be used to compute some of the regression (leave-one-out deletion) diagnostics for linear and generalized linear models discussed in Belsley, Kuh and Welsch (1980), Cook and Weisberg (1982), etc.
| Would you like to continue with one of these lessons?

1: R Programming Basic Building Blocks
2: The R Programming Environment Data Manipulation
3: No. Let me start something new.

Selection: 3

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Regression Models
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 2

| Please choose a lesson, or type 0 to return to course menu.

 1: Introduction
 2: Residuals
 3: Least Squares Estimation
 4: Residual Variation
 5: Introduction to Multivariable Regression
 6: MultiVar Examples
 7: MultiVar Examples2
 8: MultiVar Examples3
 9: Residuals Diagnostics and Variation
10: Variance Inflation Factors
11: Overfitting and Underfitting
12: Binary Outcomes
13: Count Outcomes

Selection: 9

| Attempting to load lesson dependencies...

| Package ‘datasets’ loaded correctly!

  |                                                                               |   0%

| Residuals, Diagnostics, and Variation. (Slides for this and other Data Science courses
| may be found at github https://github.com/DataScienceSpecialization/courses. If you
| care to use them, they must be downloaded as a zip file and viewed locally. This
| lesson corresponds to Regression_Models/02_04_residuals_variation_diagnostics.)

...

  |==                                                                             |   3%
| In the accompanying figure there is a fairly obvious outlier. However obvious, it does
| not affect the fit very much as can be seen by comparing the orange line with the
| black. The orange line represents a fit in which the outlier is included in the data
| set, and the black line represents a fit in which the outlier is excluded. Including
| this outlier does not change the fit very much, so it is said to lack influence.

...

  |=====                                                                          |   6%
| This next figure also has a fairly obvious outlier, but in this case including the
| outlier changes the fit a great deal. The slope and the residuals of the orange line
| are very different than those of the black line. This outlier is said to be
| influential.

...

  |=======                                                                        |   9%
| Outliers may or may not belong in the data. They may represent real events or they may
| be spurious. In any case, they should be examined. In order to spot them, R provides
| various diagnostic plots and measures of influence. In this lesson we'll illustrate
| their meanings and use. The basic technique is to examine the effects of leaving one
| sample out, as we did in comparing the black and orange lines above. We'll use the
| influential outlier to illustrate, since leaving it out has clear effects.

...

  |==========                                                                     |  12%
| The influential outlier is in a data frame named out2. It has two columns, labeled y
| and x, respectively. To begin, create a model named fit using fit <- lm(y ~ x, out2)
| or an equivalent expression.

> fit <- lm(y ~ x, out2)

| Perseverance, that's the answer.

  |============                                                                   |  15%
| The simplest diagnostic plot displays residuals versus fitted values. Residuals should
| be uncorrelated with the fit, independent and (almost) identically distributed with
| mean zero. Enter plot(fit, which=1) at the R prompt to see if this is the case.

> plot(fit, which=1)

| You are really on a roll!

  |==============                                                                 |  18%
| Do the residuals appear uncorrelated with the fit?

1: No. There is a linear pattern involving all but one residual and the fit.
2: Yes

Selection: 1

| Your dedication is inspiring!

  |=================                                                              |  21%
| The Residuals vs Fitted plot labels certain points with their row names or numbers,
| numbers in our case. Which of the three labeled points would you guess is our
| influential outlier?

1: 1
2: 50
3: 13

Selection: 1

| Keep working like that and you'll get there!

  |===================                                                            |  24%
| Our influential outlier is in row 1 of the data. To exclude it is just a matter using
| out2[-1, ] rather than out2 as data. Create a second model, named fitno for 'fit with
| no outlier', which excludes the outlier.

> fitno <- lm(y ~ x, out2[-1,])

| Your dedication is inspiring!

  |======================                                                         |  27%
| Display a Residuals vs Fitted plot for fitno. Remember to use which=1.

> plot(fitno, which=1)

| You got it right!

  |========================                                                       |  30%
| This plot has none of the patterned appearance of the first. It looks as we would
| expect if residuals were independently and (almost) identically distributed with zero
| mean, and were uncorrelated with the fit.

...

  |==========================                                                     |  33%
| The change which inclusion or exclusion of a sample induces in coefficents is a simple
| measure of its influence. Subtract coef(fitno) from coef(fit) to see the change
| induced by including the influential first sample.

> coef(fit) - coef(fitno)
(Intercept)           x 
-0.01167866 -0.53363019 

| Perseverance, that's the answer.

  |=============================                                                  |  36%
| dfbeta: The function, dfbeta, does the equivalent calculation for every sample in the
| data. The first row of dfbeta(fit) should match the difference we've just calculated.
| The second row is a similar calculation for the second sample, and so on. Since dfbeta
| returns a large matrix, use either head(dfbeta(fit)) or View(dfbeta(fit)) to examine
| the result.

> head(dfbeta(fit))
   (Intercept)             x
1 -0.011678662 -0.5336301857
2  0.008636967  0.0045759242
3  0.010323864 -0.0003509441
4  0.003122096 -0.0033664451
5  0.001975966 -0.0008297575
6  0.002230518 -0.0005867041

| Excellent work!

  |===============================                                                |  39%
| Comparing the first row with those below it, we see that the first sample has a much
| larger effect on the slope (the x column) than other samples. In fact, the magnitude
| of its effect is about 100 times that of any other point. Its effect on the intercept
| is not very distinctive essentially because its y coordinate is 0, the mean of the
| other samples.

...

  |==================================                                             |  42%
| When a sample is included in a model, it pulls the regression line closer to itself
| (orange line) than that of the model which excludes it (black line.) Its residual, the
| difference between its actual y value and that of a regression line, is thus smaller
| in magnitude when it is included (orange dots) than when it is omitted (black dots.)
| The ratio of these two residuals, orange to black, is therefore small in magnitude for
| an influential sample. For a sample which is not influential the ratio would be close
| to 1. Hence, 1 minus the ratio is a measure of influence, near 0 for points which are
| not influential, and near 1 for points which are.

...

  |====================================                                           |  45%
| This measure is sometimes called influence, sometimes leverage, and sometimes hat
| value. Since it is 1 minus the ratio of two residuals, to calculate it from scratch we
| must first obtain the two residuals. The ratio's numerator (orange dots) is the
| residual of the first sample of the model we called fit. The model fitno, which
| excludes this sample, also excludes its residual, so we will have to calculate its
| value. This is easily done. We use R's predict function to calculate fitno's predicted
| value of y and subtract it from the actual value. Use the expression resno <- out2[1,
| "y"] - predict(fitno, out2[1,]) to do the calculation.

> resno <- out2[1,"y"] - predict(fitno, out2[1,])

| Nice work!

  |======================================                                         |  48%
| Now calculate the influence of our outlier using 1-resid(fit)[1]/resno or an
| equivalent expression.

> 1-resid(fit)[1]/resno
        1 
0.6311547 

| You are amazing!

  |=========================================                                      |  52%
| hatvalues: The function, hatvalues, performs for every sample a calculation equivalent
| to the one you've just done. Thus the first entry of hatvalues(fit) should match the
| value which you have just calculated. Since there are quite a few samples, use
| head(hatvalues(fit)) or View(hatvalues(fit)) to compare the influence measure of our
| outlier to that of some other samples.

> head(hatvalues(fit))
         1          2          3          4          5          6 
0.63115474 0.02324999 0.01962520 0.04326099 0.02255531 0.02071441 

| Your dedication is inspiring!

  |===========================================                                    |  55%
| Residuals of individual samples are sometimes treated as having the same variance,
| which is estimated as the sample variance of the entire set of residuals.
| Theoretically, however, residuals of individual samples have different variances and
| these differences can become large in the presence of outliers. Standardized and
| Studentized residuals attempt to compensate for this effect in two slightly different
| ways. Both use hat values.

...

  |=============================================                                  |  58%
| We'll consider standardized residuals first. To begin, calculate the sample standard
| deviation of fit's residual by dividing fit's deviance, i.e., its residual sum of
| squares, by the residual degrees of freedom and taking the square root. Store the
| result in a variable called sigma.

> sigma <- sqrt(sum(fit$residuals^2)/(nrow(out2)-1))

| Try again. Getting it right on the first try is boring anyway! Or, type info() for
| more options.

| Enter sigma <- sqrt(deviance(fit)/df.residual(fit)) or an equivalent expression at the
| R prompt.

> sigma <- sqrt(deviance(fit)/df.residual(fit))

| That's the answer I was looking for.

  |================================================                               |  61%
| Ordinarily we would just divide fit's residual (which has mean 0) by sigma. In the
| present case we multiply sigma times sqrt(1-hatvalues(fit)) to estimate standard
| deviations of individual samples. Thus, instead of dividing resid(fit) by sigma, we
| divide by sigma*sqrt(1-hatvalues(fit)). The result is called the standardized
| residual. Compute fit's standardized residual and store it in a variable named rstd.

> rstd <- resid(fit)/(sigma*sqrt(1-hatvalues(fit)))

| All that practice is paying off!

  |==================================================                             |  64%
| rstandard: The function, rstandard, computes the standardized residual which we have
| just computed step by step. Use head(cbind(rstd, rstandard(fit))) or View(cbind(rstd,
| rstandard(fit))) to compare the two calculations.

> head(cbind(rstd, rstandard(fit)))
        rstd           
1 -5.1928156 -5.1928156
2  0.9389601  0.9389601
3  1.0450409  1.0450409
4  0.2682743  0.2682743
5  0.1893339  0.1893339
6  0.2186961  0.2186961

| Your dedication is inspiring!

  |=====================================================                          |  67%
| A Scale-Location plot shows the square root of standardized residuals against fitted
| values. Use plot(fit, which=3) to display it.

> plot(fit, which=3)

| You got it right!

  |=======================================================                        |  70%
| Most of the diagnostic statistics under discussion were developed because of perceived
| shortcomings of other diagnostics and because their distributions under a null
| hypothesis could be characterized. The assumption that residuals are approximately
| normal is implicit in such characterizations. Since standardized residuals adjust for
| individual residual variances, a QQ plot of standardized residuals against normal with
| constant variance is of interest. Use plot(fit, which=2) to display this diagnostic
| plot.

> plot(fit, which=2)

| Keep up the great work!

  |=========================================================                      |  73%
| Look at the outlier's standardized residual, labeled 1 on the Normal QQ plot. About
| how many standard deviations from the mean is it?

1: About -2
2: About -5

Selection: 2

| You got it!

  |============================================================                   |  76%
| Studentized residuals, (sometimes called externally Studentized residuals,) estimate
| the standard deviations of individual residuals using, in addition to individual hat
| values, the deviance of a model which leaves the associated sample out. We'll
| illustrate using the outlier. Recalling that the model we called fitno omits the
| outlier sample, calculate the sample standard deviation of fitno's residual by
| dividing its deviance, by its residual degrees of freedom and taking the square root.
| Store the result in a variable called sigma1.

> sigma1 <- sqrt(deviance(fitno)/df.residual(fitno))

| Perseverance, that's the answer.

  |==============================================================                 |  79%
| Calculate the Studentized residual for the outlier sample by dividing resid(fit)[1] by
| the product of sigma1 and sqrt(1-hatvalues(fit)[1]). There is no need to store this in
| a variable.

> resid(fit)[1]/(sigma1*sqrt(1-hatvalues(fit)[1]))
        1 
-7.664261 

| You are really on a roll!

  |=================================================================              |  82%
| rstudent: The function, rstudent, calculates Studentized residuals for each sample
| using a procedure equivalent to that which we just used for the outlier. Thus
| rstudent(fit)[1] should match the value we calculated in the previous question. Use
| head(rstudent(fit)) or View(rstudent(fit)) to verify this and to compare the
| Studentized residual of the outlier with those of other samples.

> head(rstudent(fit))
         1          2          3          4          5          6 
-7.6642608  0.9378046  1.0460451  0.2657179  0.1874606  0.2165588 

| Your dedication is inspiring!

  |===================================================================            |  85%
| Cook's distance is the last influence measure we will consider. It is essentially the
| sum of squared differences between values fitted with and without a particular sample.
| It is normalized (divided by) residual sample variance times the number of predictors
| which is 2 in our case (the intercept and x.) It essentially tells how much a given
| sample changes a model. We'll illustrate once again by calculating Cook's distance for
| the outlier.

...

  |=====================================================================          |  88%
| We'll begin by calculating the difference in predicted values between fit and fitno,
| the models which respectively include and omit the outlier. This is most easily done
| by subtracting predict(fit, out2) from predict(fitno, out2). Store the difference in a
| variable named dy.

> dy <- predict(fitno, out2) - predict(fit, out2)

| That's the answer I was looking for.

  |========================================================================       |  91%
| Recall that we calculated the sample standard deviation of fit's residual, sigma,
| earlier. Divide the summed squares of dy by 2*sigma^2 to calculate the outlier's
| Cook's distance. There is no need to store the result in a variable.

> sum(dy^2)/(2*sigma^2)
[1] 23.07105

| Keep working like that and you'll get there!

  |==========================================================================     |  94%
| cooks.distance: The function, cooks.distance, will calculate Cook's distance for each
| sample. Rather than verify that cooks.distance(fit)[1] is equal to the value just
| calculated, because that sort of thing must be getting tedious by now, display a
| diagnostic plot which uses Cook's distance using plot(fit, which=5).

> plot(fit, which=5)

| Your dedication is inspiring!

  |=============================================================================  |  97%
| That concludes swirl's coverage of Residuals, Diagnostics, and Variation. The HTML5
| slides for this as well as other units in the Johns Hopkins Data Science
| Specialization can be found here:
| https://github.com/DataScienceSpecialization/courses. They must be downloaded and
| viewed locally.

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

| Excellent work!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 9 の終了です。

3.11 Lesson 10: Variance Inflation Factors

Required car Package.

| Would you like to continue with one of these lessons?

1: R Programming Basic Building Blocks
2: The R Programming Environment Data Manipulation
3: No. Let me start something new.

Selection: 3

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Regression Models
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 2

| Please choose a lesson, or type 0 to return to course menu.

 1: Introduction
 2: Residuals
 3: Least Squares Estimation
 4: Residual Variation
 5: Introduction to Multivariable Regression
 6: MultiVar Examples
 7: MultiVar Examples2
 8: MultiVar Examples3
 9: Residuals Diagnostics and Variation
10: Variance Inflation Factors
11: Overfitting and Underfitting
12: Binary Outcomes
13: Count Outcomes

Selection: 10

| Attempting to load lesson dependencies...

| This lesson requires the ‘car’ package. Would you like me to install it for you now?

1: Yes
2: No

Selection: 1

| Trying to install package ‘car’ now...
also installing the dependencies ‘zip’, ‘SparseM’, ‘MatrixModels’, ‘sp’, ‘data.table’, ‘openxlsx’, ‘carData’, ‘abind’, ‘pbkrtest’, ‘quantreg’, ‘maptools’, ‘rio’


| Package ‘car’ loaded correctly!

  |                                                                               |   0%

| Variance Inflation Factors. (Slides for this and other Data Science courses may be
| found at github https://github.com/DataScienceSpecialization/courses. If you care to
| use them, they must be downloaded as a zip file and viewed locally. This lesson
| corresponds to Regression_Models/02_04_residuals_variation_diagnostics.)

...

  |===                                                                            |   4%
| In modeling, our interest lies in parsimonious, interpretable representations of the
| data that enhance our understanding of the phenomena under study. Omitting variables
| results in bias in the coefficients of interest - unless their regressors are
| uncorrelated with the omitted ones. On the other hand, including any new variables
| increases (actual, not estimated) standard errors of other regressors. So we don't
| want to idly throw variables into the model. This lesson is about the second of these
| two issues, which is known as variance inflation.

...

  |=======                                                                        |   8%
| We shall use simulations to illustrate variance inflation. The source code for these
| simulations is in a file named vifSims.R which I have copied into your working
| directory and tried to display in your source code editor. If I've failed to display
| it, you should open it manually.

...

  |==========                                                                     |  12%
| Find the function, makelms, at the top of vifSims.R. The final expression in makelms
| creates 3 linear models. The first, lm(y ~ x1), predicts y in terms of x1, the second
| predicts y in terms of x1 and x2, the third in terms of all three regressors. The
| second coefficient of each model, for instance coef(lm(y ~ x1))[2], is extracted and
| returned in a 3-long vector. What does this second coefficient represent?

1: The coefficient of the intercept.
2: The coefficient of x2.
3: The coefficient of x1.

Selection: 3

| Your dedication is inspiring!

  |=============                                                                  |  17%
| In makelms, the simulated dependent variable, y, depends on which of the regressors?

1: x1 and x2
2: x1
3: x1, x2, and x3

Selection: 2

| Your dedication is inspiring!

  |================                                                               |  21%
| In vifSims.R, find the functions, rgp1() and rgp2(). Both functions generate 3
| regressors, x1, x2, and x3. Compare the lines following the comment Point A in rgp1()
| with those following Point C in rgp2(). Which of the following statements about x1,
| x2, and x3 is true?

1: x1, x2, and x3 are correlated in rgp1(), but not in rgp2().
2: x1, x2, and x3 are uncorrelated in rgp1(), but not in rgp2().
3: x1, x2, and x3 are uncorrelated in both rgp1() and rgp2().
4: x1, x2, and x3 are correlated in both rgp1() and rgp2().

Selection: 2

| You got it!

  |====================                                                           |  25%
| In the line following Point B in rgp1(), the function maklms(x1, x2, x3) is applied
| 1000 times. Each time it is applied, it simulates a new dependent variable, y, and
| returns estimates of the coefficient of x1 for each of the 3 models, y ~ x1, y ~ x1 +
| x2, and y ~ x1 + x2 + x3. It thus computes 1000 estimates of the 3 coefficients,
| collecting the results in 3x1000 array, beta. In the next line, the expression,
| apply(betas, 1, var), does which of the following?

1: Computes the variance of each row.
2: Computes the variance of each column.

Selection: 1

| Your dedication is inspiring!

  |=======================                                                        |  29%
| The function rgp1() computes the variance in estimates of the coefficient of x1 in
| each of the three models, y ~ x1, y ~ x1 + x2, and y ~ x1 + x2 + x3. (The results are
| rounded to 5 decimal places for convenient viewing.) This simulation approximates the
| variance (i.e., squared standard error) of x1's coefficient in each of these three
| models. Recall that variance inflation is due to correlated regressors and that in
| rgp1() the regressors are uncorrelated. Run the simulation rgp1() now. Be patient. It
| takes a while.

> rgp1()
[1] "Processing. Please wait."
     x1      x1      x1 
0.00110 0.00111 0.00112 

| You are quite good my friend!

  |==========================                                                     |  33%
| The variances in each of the three models are approximately equal, as expected, since
| the other regressors, x2 and x3, are uncorrelated with the regressor of interest, x1.
| However, in rgp2(), x2 and x3 both depend on x1, so we should expect an effect. From
| the expressions assigning x2 and x3 which follow Point C, which is more strongly
| correlated with x1?

1: x2
2: x3

Selection: 2

| Keep working like that and you'll get there!

  |==============================                                                 |  38%
| Run rgp2() to simulate standard errors in the coefficient of x1 for cases in which x1
| is correlated with the other regressors

> rgp2()
[1] "Processing. Please wait."
     x1      x1      x1 
0.00110 0.00240 0.00981 

| You are quite good my friend!

  |=================================                                              |  42%
| In this case, variance inflation due to correlated regressors is clear, and is most
| pronounced in the third model, y ~ x1 + x2 + x3, since x3 is the regressor most
| strongly correlated with x1.

...

  |====================================                                           |  46%
| In these two simulations we had 1000 samples of estimated coefficients, hence could
| calculate sample variance in order to illustrate the effect. In a real case, we have
| only one set of coefficients and we depend on theoretical estimates. However,
| theoretical estimates contain an unknown constant of proportionality. We therefore
| depend on ratios of theoretical estimates called Variance Inflation Factors, or VIFs.

...

  |========================================                                       |  50%
| A variance inflation factor (VIF) is a ratio of estimated variances, the variance due
| to including the ith regressor, divided by that due to including a corresponding ideal
| regressor which is uncorrelated with the others. VIF's can be calculated directly, but
| the car package provides a convenient method for the purpose as we will illustrate
| using the Swiss data from the datasets package.

...

  |===========================================                                    |  54%
| According to its documentation, the Swiss data set consists of a standardized
| fertility measure and socioeconomic indicators for each of 47 French-speaking
| provinces of Switzerland in about 1888 when Swiss fertility rates began to fall. Type
| head(swiss) or View(swiss) to examine the data.

> head(swiss)
             Fertility Agriculture Examination Education Catholic Infant.Mortality
Courtelary        80.2        17.0          15        12     9.96             22.2
Delemont          83.1        45.1           6         9    84.84             22.2
Franches-Mnt      92.5        39.7           5         5    93.40             20.2
Moutier           85.8        36.5          12         7    33.77             20.3
Neuveville        76.9        43.5          17        15     5.16             20.6
Porrentruy        76.1        35.3           9         7    90.57             26.6

| Excellent job!

  |==============================================                                 |  58%
| Fertility was thought to depend on five socioeconomic factors: the percent of males
| working in Agriculture, the percent of draftees receiving the highest grade on the
| army's Examination, the percent of draftees with Education beyond primary school, the
| percent of the population which was Roman Catholic, and the rate of Infant Mortality
| in the province. Use linear regression to model Fertility in terms of these five
| regressors and an intercept. Store the model in a variable named mdl.

> mdl <- lm(Fertility ~ Agriculture + Examination + Education + Catholic +Infant.Mortality, data = swiss)

| Keep working like that and you'll get there!

  |=================================================                              |  62%
| Calculate the VIF's for each of the regressors using vif(mdl).

> vif(mdl)
     Agriculture      Examination        Education         Catholic Infant.Mortality 
        2.284129         3.675420         2.774943         1.937160         1.107542 

| You got it right!

  |=====================================================                          |  67%
| These VIF's show, for each regression coefficient, the variance inflation due to
| including all the others. For instance, the variance in the estimated coefficient of
| Education is 2.774943 times what it might have been if Education were not correlated
| with the other regressors. Since Education and score on an Examination are likely to
| be correlated, we might guess that most of the variance inflation for Education is due
| to including Examination.

...

  |========================================================                       |  71%
| Make a second linear model of Fertility in which Examination is omitted, but the other
| four regressors are included. Store the result in a variable named mdl2.

> mdl2 <- lm(Fertility ~ Agriculture + Education + Catholic +Infant.Mortality, data = swiss)

| You are quite good my friend!

  |===========================================================                    |  75%
| Calculate the VIF's for this model using vif(mdl2).

> vif(mdl2)
     Agriculture        Education         Catholic Infant.Mortality 
        2.147153         1.816361         1.299916         1.107528 

| Great job!

  |===============================================================                |  79%
| As expected, omitting Examination has markedly decreased the VIF for Education, from
| 2.774943 to 1.816361. Note that omitting Examination has had almost no effect the VIF
| for Infant Mortality. Chances are Examination and Infant Mortality are not strongly
| correlated. Now, before finishing this lesson, let's review several significant
| points.

...

  |==================================================================             |  83%
| A VIF describes the increase in the variance of a coefficient due to the correlation
| of its regressor with the other regressors. What is the relationship of a VIF to the
| standard error of its coefficient?

1: There is no relationship.
2: VIF is the square of standard error inflation.
3: They are the same.

Selection: 2

| That's the answer I was looking for.

  |=====================================================================          |  88%
| If a regressor is strongly correlated with others, hence will increase their VIF's,
| why shouldn't we just exclude it?

1: We should never exclude anything.
2: Excluding it might bias coefficient estimates of regressors with which it is correlated.
3: We should always exclude it.

Selection: 2

| That's a job well done!

  |========================================================================       |  92%
| The problems of variance inflation and bias due to excluded regressors both involve
| correlated regressors. However there are methods, such as factor analysis or principal
| componenent analysis, which can convert regressors to an equivalent uncorrelated set.
| Why then, when modeling, should we not just use uncorrelated regressors and avoid all
| the trouble?

1: Factor analysis takes too much computation.
2: Using converted regressors may make interpretation difficult.
3: We should always use uncorrelated regressors.

Selection: 2

| All that practice is paying off!

  |============================================================================   |  96%
| That completes the exercise in variance inflation. The issue of omitting regressors is
| discussed in another lesson.

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

| You got it right!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 10 の終了です。

3.12 Lesson 11: Overfitting and Underfitting


| Would you like to continue with one of these lessons?

1: R Programming Basic Building Blocks
2: The R Programming Environment Data Manipulation
3: No. Let me start something new.

Selection: 3

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Regression Models
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 0

| Leaving swirl now. Type swirl() to resume.

> swirl()

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you
| did then. If you are new, call yourself something unique.

What shall I call you? SL

| Would you like to continue with one of these lessons?

1: R Programming Basic Building Blocks
2: The R Programming Environment Data Manipulation
3: No. Let me start something new.

Selection: 3

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Regression Models
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 2

| Please choose a lesson, or type 0 to return to course menu.

 1: Introduction
 2: Residuals
 3: Least Squares Estimation
 4: Residual Variation
 5: Introduction to Multivariable Regression
 6: MultiVar Examples
 7: MultiVar Examples2
 8: MultiVar Examples3
 9: Residuals Diagnostics and Variation
10: Variance Inflation Factors
11: Overfitting and Underfitting
12: Binary Outcomes
13: Count Outcomes

Selection: 11

| Attempting to load lesson dependencies...

| Package ‘car’ loaded correctly!

  |                                                                               |   0%

| Overfitting and Underfitting. (Slides for this and other Data Science courses may be
| found at github https://github.com/DataScienceSpecialization/courses. If you care to
| use them, they must be downloaded as a zip file and viewed locally. This lesson
| corresponds to Regression_Models/02_04_residuals_variation_diagnostics.)

...

  |===                                                                            |   4%
| The Variance Inflation Factors lesson demonstrated that including new variables will
| increase standard errors of coefficient estimates of other, correlated regressors.
| Hence, we don't want to idly throw variables into the model. On the other hand,
| omitting variables results in bias in coefficients of regressors which are correlated
| with the omitted ones. In this lesson we demonstrate the effect of omitted variables
| and discuss the use of ANOVA to construct parsimonious, interpretable representations
| of the data.

...

  |======                                                                         |   7%
| First, I would like to illustrate how omitting a correlated regressor can bias
| estimates of a coefficient. The relevant source code is in a file named fitting.R
| which I have copied into your working directory and tried to display in your source
| code editor. If I've failed to display it, you should open it manually.

...

fitting.R の内容。

simbias <- function(seed=8765){
  # The default seed guarantees a nice histogram. This is the only
  # reason that accepting the default, x1c <- simbias(), is required in the lesson. 
  # The effect will be evident with other seeds as well.
  set.seed(seed) 
  temp <- rnorm(100)
  # Point A
  x1 <- (temp + rnorm(100))/sqrt(2)
  x2 <- (temp + rnorm(100))/sqrt(2)
  x3 <- rnorm(100)
  # Function to simulate regression of y on 2 variables.
  f <- function(k){
    # Point B
    y <- x1 + x2 + x3 + .3*rnorm(100)
    # Point C
    c(lm(y ~ x1 + x2)$coef[2],
       lm(y ~ x1 + x3)$coef[2])
  }
  # Point D
  sapply(1:150, f)
}

# Illustrate the effect of bogus regressors on residual squared error.
bogus <- function(){
  temp <- swiss
  # Add 41 columns of random regressors to a copy of the swiss data.
  for(n in 1:41){temp[,paste0("random",n)] <- rnorm(nrow(temp))}
  # Define a function to compute the deviance of Fertility regressed
  # on all regressors up to column n. The function, deviance(model), computes
  # the residual sum of squares of the model given as its argument.
  f <- function(n){deviance(lm(Fertility ~ ., temp[,1:n]))}
  # Apply f to data from n=6, i.e., the legitimate regressors,
  # through n=47, i.e., a full complement of bogus regressors.
  rss <- sapply(6:47, f)
  # Display result.
  plot(0:41, rss, xlab="Number of bogus regressors.", ylab="Residual squared error.",
       main="Residual Squared Error for Swiss Data\nUsing Irrelevant (Bogus) Regressors",
       pch=21, bg='red')
}

# Plot histograms illustrating bias in estimates of a regressor
# coefficient 1) when an uncorrelated regressor is missing and
# 2) when a correlated regressor is missing.
x1hist <- function(x1c){
  p1 <- hist(x1c[1,], plot=FALSE)
  p2 <- hist(x1c[2,], plot=FALSE)
  yrange <- c(0, max(p1$counts, p2$counts))
  plot(p1, col=rgb(0,0,1,1/4), xlim=range(x1c), ylim=yrange, xlab="Estimated coefficient of x1",
        main="Bias Effect of Omitted Regressor")
  plot(p2, col=rgb(1,0,0,1/4), xlim=range(x1c), ylim=yrange, add=TRUE)
  legend(1.1, 40, c("Uncorrelated regressor, x3, omitted", "Correlated regressor, x2, omitted"),
         fill=c(rgb(0,0,1,1/4), rgb(1,0,0,1/4)))
}
  |=========                                                                      |  11%
| Find the function simbias() at the top of fitting.R. Below the comment labeled Point A
| three regressors, x1, x2, and x3, are defined. Which of these two are correlated?

1: x1 and x2
2: x1 and x3
3: x2 and x3

Selection: 1

| You're the best!

  |============                                                                   |  15%
| Within simbias() another function, f(n), is defined. It forms a dependent variable, y,
| and at Point C returns the coefficient of x1 as estimated by two models, y ~ x1 + x2,
| and y ~ x1 + x3. One regressor is missing in each model. In the expression for y
| (Point B,) what is the actual coefficient of x1?

1: 0.3
2: 1
3: 1/sqrt(2)

Selection: 2

| You are doing so well!

  |===============                                                                |  19%
| At Point D in simbias() the internal function, f(), is applied 150 times and the
| results returned as a 2x150 matrix. The first row of this matrix contains independent
| estimates of x1's coefficient in the case that x3, the regressor uncorrelated with x1,
| is omitted. The second row contains estimates of x1's coefficient when the correlated
| regressor, x2, is omitted. Use simbias(), accepting the default argument, to form
| these estimates and store the result in a variable called x1c. (The default argument
| just guarantees a nice histogram, in a figure to follow.)

> x1c<- simbias()

| Excellent work!

  |==================                                                             |  22%
| The actual coefficient of x1 is 1. Having been warned that omitting a correlated
| regressor would bias estimates of x1's coefficient, we would expect the mean estimate
| of x1c's second row to be farther from 1 than the mean of x1c's first row. Using
| apply(x1c, 1, mean), find the means of each row.

> apply(x1c, 1, mean)
      x1       x1 
1.034403 1.476944 

| All that hard work is paying off!

  |====================                                                           |  26%
| Histograms of estimates from x1c's first row (blue) and second row (red) are shown.
| Estimates from the second row are clearly more than two standard deviations from the
| correct value of 1, and the bias due to omitting the correlated regressor is evident.
| (The code which produced this figure is incidental to the lesson, but is available as
| the function x1hist(), at the bottom of fitting.R.)

...

  |=======================                                                        |  30%
| Adding even irrelevant regressors can cause a model to tend toward a perfect fit. We
| illustrate this by adding random regressors to the swiss data and regressing on
| progressively more of them. As the number of regressors approaches the number of data
| points (47), the residual sum of squares, also known as the deviance, approaches 0.
| (The source code for this figure can be found as function bogus() in fitting.R.

...

  |==========================                                                     |  33%
| In the figure, adding random regressors decreased deviance, but we would be mistaken
| to believe that such decreases are significant. To assess significance, we should take
| into account that adding regressors reduces residual degrees of freedom. Analysis of
| variance (ANOVA) is a useful way to quantify the significance of additional
| regressors. To exemplify its use, we will use the swiss data.

...

  |=============================                                                  |  37%
| Recall that the Swiss data set consists of a standardized fertility measure and
| socioeconomic indicators for each of 47 French-speaking provinces of Switzerland in
| 1888. Fertility was thought to depend on an intercept and five factors denoted as
| Agriculture, Examination, Education, Catholic, and Infant Mortality. To begin our
| ANOVA example, regress Fertility on Agriculture and store the result in a variable
| named fit1.

> fit1 <- lm(Fertility ~ Agriculture, data = swiss)

| All that hard work is paying off!

  |================================                                               |  41%
| Create another model, named fit3, by regressing Fertility on Agriculture and two
| additonal regressors, Examination and Education.

> fit3 <- lm(Fertility ~ Agriculture + Examination + Education, data = swiss)

| That's a job well done!

  |===================================                                            |  44%
| We'll now use anova to assess the significance of the two added regressors. The null
| hypothesis is that the added regressors are not significant. We'll explain in detail
| shortly, but right now just apply the significance test by entering anova(fit1, fit3).

> anova(fit1, fit3)
Analysis of Variance Table

Model 1: Fertility ~ Agriculture
Model 2: Fertility ~ Agriculture + Examination + Education
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1     45 6283.1                                  
2     43 3180.9  2    3102.2 20.968 4.407e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

| All that hard work is paying off!

  |======================================                                         |  48%
| The three asterisks, ***, at the lower right of the printed table indicate that the
| null hypothesis is rejected at the 0.001 level, so at least one of the two additional
| regressors is significant. Rejection is based on a right-tailed F test, Pr(>F),
| applied to an F value. According to the table, what is that F value?

1: 20.968
2: 45
3: 3102.2

Selection: 1

| All that practice is paying off!

  |=========================================                                      |  52%
| An F statistic is a ratio of two sums of squares divided by their respective degrees
| of freedom. If the two scaled sums are independent and centrally chi-squared
| distributed with the same variance, the statistic will have an F distribution with
| parameters given by the two degrees of freedom. In our case, the two sums are residual
| sums of squares which, as we know, have mean zero hence are centrally chi-squared
| provided the residuals themselves are normally distributed. The two relevant sums are
| given in the RSS (Residual Sum of Squares) column of the table. What are they?

1: 45 and 43
2: 2 and 3102.2
3: 6283.1 and 3180.9

Selection: 3

| That's a job well done!

  |============================================                                   |  56%
| R's function, deviance(model), calculates the residual sum of squares, also known as
| the deviance, of the linear model given as its argument. Using deviance(fit3), verify
| that 3180.9 is fit3's residual sum of squares. (Of course, fit3 is called Model 2 in
| the table.)

> deviance(fit3)
[1] 3180.925

| You are doing so well!

  |===============================================                                |  59%
| In the next several steps, we will show how to calculate the F value, 20.968, which
| appears in the table printed by anova(). We'll begin with the denominator, which is
| fit3's residual sum of squares divided by its degrees of freedom. Fit3 has 43 residual
| degrees of freedom. This figure is obtained by subtracting 4, the the number of fit3's
| predictors (the 3 named and the intercept,) from 47, the number of samples in swiss.
| Store the value of deviance(fit3)/43 in a variable named d.

> d <- deviance(fit3)/43

| Perseverance, that's the answer.

  |==================================================                             |  63%
| The numerator is the difference, deviance(fit1)-deviance(fit3), divided by the
| difference in the residual degrees of freedom of fit1 and fit3, namely 2. This
| calculation requires some theoretical justification which we omit, but the essential
| idea is that fit3 has 2 predictors in addition to those of fit1. Calculate the
| numerator and store it in a variable named n.

> n <- (deviance(fit1)-deviance(fit3))/2

| Your dedication is inspiring!

  |=====================================================                          |  67%
| Calculate the ratio, n/d, to show it is essentially equal to the F value, 20.968,
| given by anova().

> n/d
[1] 20.96783

| You're the best!

  |========================================================                       |  70%
| We'll now calculate the p-value, which is the probability that a value of n/d or
| larger would be drawn from an F distribution which has parameters 2 and 43. This value
| was given as 4.407e-07 in the column labeled Pr(>F) in the table printed by anova(), a
| very unlikely value if the null hypothesis were true. Calculate this p-value using
| pf(n/d, 2, 43, lower.tail=FALSE).

> pf(n/d, 2, 43, lower.tail=FALSE)
[1] 4.406913e-07

| Great job!

  |===========================================================                    |  74%
| Based on the calculated p-value, a false rejection of the null hypothesis is extremely
| unlikely. We are confident that fit3 is significantly better than fit1, with one
| caveat: analysis of variance is sensitive to its assumption that model residuals are
| approximately normal. If they are not, we could get a small p-value for that reason.
| It is thus worth testing residuals for normality. The Shapiro-Wilk test is quick and
| easy in R. Normality is its null hypothesis. Use shapiro.test(fit3$residuals) to test
| the residual of fit3.

> shapiro.test(fit3$residuals)

    Shapiro-Wilk normality test

data:  fit3$residuals
W = 0.97276, p-value = 0.336


| Great job!

  |=============================================================                  |  78%
| The Shapiro-Wilk p-value of 0.336 fails to reject normality, supporting confidence in
| our analysis of variance. In order to illustrate the use of anova() with more than two
| models, I have constructed fit5 and fit6 using the first 5 and all 6 regressors
| (including the intercept) respectively. Thus fit1, fit3, fit5, and fit6 form a nested
| sequence of models; the regressors of one are included in those of the next. Enter
| anova(fit1, fit3, fit5, fit6) at the R prompt now to get the flavor.

> anova(fit1, fit3, fit5, fit6)
Analysis of Variance Table

Model 1: Fertility ~ Agriculture
Model 2: Fertility ~ Agriculture + Examination + Education
Model 3: Fertility ~ Agriculture + Examination + Education + Catholic
Model 4: Fertility ~ Agriculture + Examination + Education + Catholic + 
    Infant.Mortality
  Res.Df    RSS Df Sum of Sq       F    Pr(>F)    
1     45 6283.1                                   
2     43 3180.9  2   3102.19 30.2107 8.638e-09 ***
3     42 2513.8  1    667.13 12.9937 0.0008387 ***
4     41 2105.0  1    408.75  7.9612 0.0073357 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

| That's correct!

  |================================================================               |  81%
| It appears that each model is a significant improvement on its predecessor. Before
| ending the lesson, let's review a few salient points.

...

  |===================================================================            |  85%
| Omitting a regressor can bias estimation of the coefficient of certain other
| regressors. Which ones?

1: Correlated regressors
2: Uncorrelated regressors

Selection: 1

| Nice work!

  |======================================================================         |  89%
| Including more regressors will reduce a model's residual sum of squares, even if the
| new regressors are irrelevant. True or False?

1: False
2: It depends on circumstances.
3: True

Selection: 3

| Great job!

  |=========================================================================      |  93%
| When adding regressors, the reduction in residual sums of squares should be tested for
| significance above and beyond that of reducing residual degrees of freedom. R's
| anova() function uses an F-test for this purpose. What else should be done to insure
| that anova() applies?

1: Regressors should be tested for normality.
2: The residuals should be tested for having zero means.
3: Model residuals should be tested for normality.

Selection: 3

| You're the best!

  |============================================================================   |  96%
| That completes the lesson on underfitting and overfitting.

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

| That's correct!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 11 の終了です。

3.13 Lesson 12: Binary Outcomes

| Would you like to continue with one of these lessons?

1: R Programming Basic Building Blocks
2: The R Programming Environment Data Manipulation
3: No. Let me start something new.

Selection: 3

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Regression Models
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 2

| Please choose a lesson, or type 0 to return to course menu.

 1: Introduction
 2: Residuals
 3: Least Squares Estimation
 4: Residual Variation
 5: Introduction to Multivariable Regression
 6: MultiVar Examples
 7: MultiVar Examples2
 8: MultiVar Examples3
 9: Residuals Diagnostics and Variation
10: Variance Inflation Factors
11: Overfitting and Underfitting
12: Binary Outcomes
13: Count Outcomes

Selection: 12
  |                                                                               |   0%

| Binary Outcomes. (Slides for this and other Data Science courses may be found at
| github https://github.com/DataScienceSpecialization/courses. If you care to use them,
| they must be downloaded as a zip file and viewed locally. This lesson corresponds to
| Regression_Models/03_02_binaryOutcomes.)

...

  |===                                                                            |   4%
| Frequently we care about outcomes that have two values such as alive or dead, win or
| lose, success or failure. Such outcomes are called binary, Bernoulli, or 0/1. A
| collection of exchangeable binary outcomes for the same covariate data are called
| binomial outcomes. (Outcomes are exchangeable if their order doesn't matter.)

...

  |======                                                                         |   8%
| In this unit we will use glm() to model a process with a binary outcome and a
| continuous predictor. We will also learn how to interpret glm coefficients, and how to
| find confidence intervals. But first, let's discuss odds.

...

  |=========                                                                      |  12%
| The Baltimore Ravens are a team in the American Football League. In post season
| (championship) play they win about 2/3 of their games. In other words, they win about
| twice as often as they lose. If I wanted to bet on them, I would have to offer 2-to-1
| odds--if they lost I would pay you $2, but if they won you would pay me only $1. That
| way, in the long run over many bets, we'd both expect to win as much money as we'd
| lost.

...

  |============                                                                   |  15%
| During the regular season the Ravens win about 55% of their games. What odds would I
| have to offer in the regular season?

1: Any of these
2: 11 to 9
3: 55 to 45
4: 1.22222 to 1

Selection: 1

| You are amazing!

  |===============                                                                |  19%
| All of the answers are correct because they all represent the same ratio. If p is the
| probability of an event, the associated odds are p/(1-p).

...

  |==================                                                             |  23%
| Now suppose we want to see how the Ravens' odds depends on their offense. In other
| words, we want to model how p, or some function of it, depends on how many points the
| Ravens are able to score. Of course, we can't observe p, we can only observe wins,
| losses, and the associated scores. Here is a Box plot of one season's worth of such
| observations.

...

  |=====================                                                          |  27%
| We can see that the Ravens tend to win more when they score more points. In fact,
| about 3/4 of their losses are at or below a certain score and about 3/4 of their wins
| are at or above it. What score am I talking about? (Remember that the purple boxes
| represent 50% of the samples, and the "T's" 25%.)

1: 30
2: 40
3: 23
4: 18

Selection: 3

| Your dedication is inspiring!

  |========================                                                       |  31%
| There were 9 games in which the Ravens scored 23 points or less. They won 4 of these
| games, so we might guess their probability of winning, given that they score 23 points
| or less, is about 1/2.

...

  |===========================                                                    |  35%
| There were 11 games in which the Ravens scored 24 points or more. They won all but one
| of these. Verify this by checking the data yourself. It is in a data frame called
| ravenData. Look at it by typing either ravenData or View(ravenData).

> ravenData
   ravenWinNum ravenWin ravenScore
1            1        W          9
2            0        L         13
3            1        W         13
4            1        W         16
5            0        L         17
6            0        L         17
7            0        L         20
8            0        L         23
9            1        W         23
10           1        W         24
11           1        W         25
12           1        W         28
13           0        L         28
14           1        W         31
15           1        W         31
16           1        W         33
17           1        W         34
18           1        W         38
19           1        W         44
20           1        W         55

| Perseverance, that's the answer.

  |==============================                                                 |  38%
| We see a fairly rapid transition in the Ravens' win/loss record between 23 and 28
| points. At 23 points and below they win about half their games, between 24 and 28
| points they win 3 of 4, and above 28 points they win them all. From this, we get a
| very crude idea of the correspondence between points scored and the probability of a
| win. We get an S shaped curve, a graffiti S anyway.

...

  |=================================                                              |  42%
| Of course, we would expect a real curve to be smoother. We would not, for instance,
| expect the Ravens to win half the games in which they scored zero points, nor to win
| all the games in which they scored more than 28. A generalized linear model which has
| these properties supposes that the log odds of a win depend linearly on the score.
| That is, log(p/(1-p)) = b0 + b1*score. The link function, log(p/(1-p)), is called the
| logit, and the process of finding the best b0 and b1, is called logistic regression.

...

  |====================================                                           |  46%
| The "best" b0 and b1 are those which maximize the likelihood of the actual win/loss
| record. Based on the score of a game, b0 and b1 give us a log odds, which we can
| convert to a probability, p, of a win. We would like p to be high for the scores of
| winning games, and low for the scores of losses.

...

  |========================================                                       |  50%
| We can use R's glm() function to find the b0 and b1 which maximize the likelihood of
| our observations. Referring back to the data frame, we want to predict the binary
| outcomes, ravenWinNum, from the points scored, ravenScore. This corresponds to the
| formula, ravenWinNum ~ ravenScore, which is the first argument to glm. The second
| argument, family, describes the outcomes, which in our case are binomial. The third
| argument is the data, ravenData. Call glm with these parameters and store the result
| in a variable named mdl.

> mdl <- glm(ravenWinNum ~ ravenScore, family = binomial, data = ravenData)

| You got it right!

  |===========================================                                    |  54%
| The probabilities estimated by logistic regression using glm() are represented by the
| black curve. It is more reasonable than our crude estimate in several respects: It
| increases smoothly with score, it estimates that 15 points give the Ravens a 50%
| chance of winning, that 28 points give them an 80% chance, and that 55 points make a
| win very likely (98%) but not absolutely certain.

...

  |==============================================                                 |  58%
| The model is less credible at scores lower than 9. Of course, there is no data in that
| region; the Ravens scored at least 9 points in every game. The model gives them a 33%
| chance of winning if they score 9 points, which may be reasonable, but it also gives
| them a 16% chance of winning even if they score no points! We can use R's predict()
| function to see the model's estimates for lower scores. The function will take mdl and
| a data frame of scores as arguments and will return log odds for the give scores. Call
| predict(mdl, data.frame(ravenScore=c(0, 3, 6))) and store the result in a variable
| called lodds.

> lodds <- predict(mdl, data.frame(ravenScore=c(0, 3, 6)))

| That's correct!

  |=================================================                              |  62%
| Since predict() gives us log odds, we will have to convert to probabilities. To
| convert log odds to probabilities use exp(lodds)/(1+exp(lodds)). Don't bother to store
| the result in a variable. We won't need it.

> exp(lodds)/(1+exp(lodds))
        1         2         3 
0.1570943 0.2041977 0.2610505 

| You are amazing!

  |====================================================                           |  65%
| As you can see, a person could make a lot of money betting against this model. When
| the Ravens score no points, the model might like 16 to 84 odds. As it turns out,
| though, the model is not that sure of itself. Typing summary(mdl) you can see the
| estimated coefficients are both within 2 standard errors of zero. Check out the
| summary now.

> summary(mdl)

Call:
glm(formula = ravenWinNum ~ ravenScore, family = binomial, data = ravenData)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7575  -1.0999   0.5305   0.8060   1.4947  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.68001    1.55412  -1.081     0.28
ravenScore   0.10658    0.06674   1.597     0.11

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 24.435  on 19  degrees of freedom
Residual deviance: 20.895  on 18  degrees of freedom
AIC: 24.895

Number of Fisher Scoring iterations: 5


| You are amazing!

  |=======================================================                        |  69%
| The coefficients estimate log odds as a linear function of points scored. They have a
| natural interpretation in terms of odds because, if b0 + b1*score estimates log odds,
| then exp(b0 + b1*score)=exp(b0)exp(b1*score) estimates odds. Thus exp(b0) is the odds
| of winning with a score of 0 (in our case 16/84,) and exp(b1) is the factor by which
| the odds of winning increase with every point scored. In our case exp(b1) =
| exp(0.10658) = 1.11. In other words, the odds of winning increase by 11% for each
| point scored.

...

  |==========================================================                     |  73%
| However, the coefficients have relatively large standard errors. A 95% confidence
| interval is roughly 2 standard errors either side of a coefficient. R's function
| confint() will find the exact lower and upper bounds to the 95% confidence intervals
| for the coefficients b0 and b1. To get the corresponding intervals for exp(b0) and
| exp(b1) we would just exponentiate the output of confint(mdl). Do this now.

> exp(confint(mdl))
Waiting for profiling to be done...
                  2.5 %   97.5 %
(Intercept) 0.005674966 3.106384
ravenScore  0.996229662 1.303304

| That's a job well done!

  |=============================================================                  |  77%
| What is the 2.5% confidence bound on the odds of winning with a score of 0 points?

1: 0.996229662
2: 0.005674966
3: 2.5%

Selection: 1

| Not exactly. Give it another go.

| It's very small.

1: 0.996229662
2: 2.5%
3: 0.005674966

Selection: 3

| You got it right!

  |================================================================               |  81%
| The lower confidence bound on the odds of winning with a score of 0 is near zero,
| which seems much more realistic than the 16/84 figure of the maximum likelihood model.
| Now look at the lower bound on exp(b1), the exponentiated coefficient of ravenScore.
| How does it suggest the odds of winning will be affected by each additional point
| scored?

1: They will decrease slightly
2: They will increase by 30%
3: They will increase slightly

Selection: 1

| Excellent job!

  |===================================================================            |  85%
| The lower confidence bound on exp(b1) suggests that the odds of winning would decrease
| slightly with every additional point scored. This is obviously unrealistic. Of course,
| confidence intervals are based on large sample assumptions and our sample consists of
| only 20 games. In fact, the GLM version of analysis of variance will show that if we
| ignore scores altogether, we don't do much worse.

...

  |======================================================================         |  88%
| Linear regression minimizes the squared difference between predicted and actual
| observations, i.e., minimizes the variance of the residual. If an additional predictor
| significantly reduces the residual's variance, the predictor is deemed important.
| Deviance extends this idea to generalized linear regression, using (negative) log
| likelihoods in place of variance. (For a detailed explanation, see the slides and
| lectures.) To see the analysis of deviance for our model, type anova(mdl).

> anova(mdl)
Analysis of Deviance Table

Model: binomial, link: logit

Response: ravenWinNum

Terms added sequentially (first to last)


           Df Deviance Resid. Df Resid. Dev
NULL                          19     24.435
ravenScore  1   3.5398        18     20.895

| You are amazing!

  |=========================================================================      |  92%
| The value, 3.5398, labeled as the deviance of ravenScore, is actually the difference
| between the deviance of our model, which includes a slope, and that of a model which
| includes only an intercept, b0. This value is centrally chi-square distributed (for
| large samples) with 1 degree of freedom (2 parameters minus 1 parameter, or
| equivalently 19-18.) The null hypothesis is that the coefficient of ravenScore is
| zero. To confidently reject this hypothesis, we would want 3.5398 to be larger than
| the 95th percentile of chi-square distribution with one degree of freedom. Use
| qchisq(0.95, 1) to compute the threshold of this percentile.

> qchisq(0.95, 1)
[1] 3.841459

| Great job!

  |============================================================================   |  96%
| As you can see, 3.5398 is close to but less than the 95th percentile threshold,
| 3.841459, hence would be regarded as consistent with the null hypothesis at the
| conventional 5% level. In other words, ravenScore adds very little to a model which
| just guesses that the Ravens win with probability 70% (their actual record that
| season) or odds 7 to 3 is almost as good. If you like, you can verify this using mdl0
| <- glm(ravenWinNum ~ 1, binomial, ravenData), but this concludes the Binary Outcomes
| example. Thank you.

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

| You are quite good my friend!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 12 の終了です。

3.14 Lesson 13: Count Outcomes

| Would you like to continue with one of these lessons?

1: R Programming Basic Building Blocks
2: The R Programming Environment Data Manipulation
3: No. Let me start something new.

Selection: 3

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Regression Models
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 2

| Please choose a lesson, or type 0 to return to course menu.

 1: Introduction
 2: Residuals
 3: Least Squares Estimation
 4: Residual Variation
 5: Introduction to Multivariable Regression
 6: MultiVar Examples
 7: MultiVar Examples2
 8: MultiVar Examples3
 9: Residuals Diagnostics and Variation
10: Variance Inflation Factors
11: Overfitting and Underfitting
12: Binary Outcomes
13: Count Outcomes

Selection: 13
  |                                                                               |   0%

| Count Outcomes. (Slides for this and other Data Science courses may be found at github
| https://github.com/DataScienceSpecialization/courses. If you care to use them, they
| must be downloaded as a zip file and viewed locally. This lesson corresponds to
| Regression_Models/03_03_countOutcomes.)

...

  |==                                                                             |   3%
| Many data take the form of counts. These might be calls to a call center, number of
| flu cases in an area, or number of cars that cross a bridge. Data may also be in the
| form of rates, e.g., percent of children passing a test. In this lesson we will use
| Poisson regression to analyze daily visits to a web site as the web site's popularity
| grows, and to analyze the percent of visits which are due to references from a
| different site.

...

  |=====                                                                          |   6%
| Visits to a web site tend to occur independently, one at a time, at a certain average
| rate. The Poisson distribution describes random processes of this type. A Poisson
| process is characterized by a single parameter, the expected rate of occurrence, which
| is usually called lambda. In our case, lambda will be expected visits per day. Of
| course, as the web site becomes more popular, lambda will grow. In other words, our
| lambda will depend on time. We will use Poisson regression to model this dependence.

...

  |=======                                                                        |   9%
| Somwhat remarkably, the variance of a Poisson process has the same value as its mean,
| lambda. You can quickly illustrate this by generating, say, n=1000 samples from a
| Poisson process using R's rpois(n, lambda) and calculating the sample variance. For
| example, type var(rpois(1000, 50)). The sample variance won't be exactly equal to the
| theoretical value, of course, but it will be fairly close.

> var(rpois(1000, 50))
[1] 50.71414

| Keep up the great work!

  |==========                                                                     |  12%
| A famous theorem implies that properly normalized sums of independent, identically
| distributed random variables will tend to become normally distributed as the number of
| samples grows large. What is that theorem?

1: The Gauss-Markov BLUE Theorem
2: The Central Limit Theorem
3: The Pythagorean Theorem

Selection: 2

| That's a job well done!

  |============                                                                   |  16%
| The counts generated by a Poisson process are, strictly speaking, slightly different
| than the normalized sums of the Central Limit Theorem. However, the counts in a given
| period of time will represent sums of larger numbers of terms as lambda increases. In
| fact, it can be formally shown that for large lambda a Poisson distribution is well
| approximated by a normal. The figure illustrates this effect. It shows progression
| from a sparse, asymetric, Poisson probability mass function on the left, to a dense,
| bell-shaped curve on the right as lambda varies from 2 to 100.

...

  |===============                                                                |  19%
| In a Poisson regression, the log of lambda is assumed to be a linear function of the
| predictors. Since we will try to model the growth of visits to a web site, the log of
| lambda will be a linear function of the date: log(lambda) = b0 + b1*date. This implies
| that the average number of hits per day, lambda, is exponential in the date: lambda =
| exp(b0)*exp(b1)^date. Exponential growth is also suggested by the smooth, black curve
| drawn though the data. Thus exp(b1) would represent the percentage by which visits
| grow per day.

...

  |=================                                                              |  22%
| If you are connected to the internet right now, would you care to visit the Leek Group
| website?

Yes or No? Yes

| Type nxt() to continue
> nxt()

| Resuming lesson...

  |====================                                                           |  25%
| Our data is in a data frame named hits. Use View(hits), head(hits), or tail(hits) to
| examine the data now.

> head(hits)
        date visits simplystats
1 2011-01-01      0           0
2 2011-01-02      0           0
3 2011-01-03      0           0
4 2011-01-04      0           0
5 2011-01-05      0           0
6 2011-01-06      0           0

| All that hard work is paying off!

  |======================                                                         |  28%
| There are three columns of data labeled date, visits, and simplystats respectively.
| The simplystats column records the number of visits which are due to references from
| another site, the Simply Statistics blog. We'll come back to that column later. For
| now, we are interested in the date and visits columns. The date will be our predictor.

...

  |=========================                                                      |  31%
| Our dates are represented in terms of R's class, Date. Verify this by typing
| class(hits[,'date']), or something equivalent.

> class(hits[,'date'])
[1] "Date"

| Excellent work!

  |===========================                                                    |  34%
| R's Date class represents dates as days since or prior to January 1, 1970. They are
| essentially numbers, and to some extent can be treated as such. Dates can, for
| example, be added or subtracted, or easily coverted to numbers. Type
| as.integer(head(hits[,'date'])) to see what I mean.

> as.integer(head(hits[,'date']))
[1] 14975 14976 14977 14978 14979 14980

| Your dedication is inspiring!

  |==============================                                                 |  38%
| The arithmetic properties of Dates allow us to use them as predictors. We'll use
| Poisson regression to predict log(lambda) as a linear function of date in a way which
| maximizes the likelihood of the counts we actually see. Our formula will be visits ~
| date. Since our outcomes (visits) are counts, our family will be 'poisson', and our
| third argument will be the data, hits. Create such a model and store it in a variable
| called mdl using the following expression or something equivalent, mdl <- glm(visits ~
| date, poisson, hits).

> mdl <- glm(visits ~ date, poisson, hits)

| That's correct!

  |================================                                               |  41%
| The figure suggests that our Poisson regression fits the data very well. The black
| line is the estimated lambda, or mean number of visits per day. We see that mean
| visits per day increased from around 5 in early 2011 to around 10 by 2012, and to
| around 20 by late 2013. It is approximately doubling every year.

...

  |===================================                                            |  44%
| Type summary(mdl) to examine the estimated coefficients and their significance.

> summary(mdl)

Call:
glm(formula = visits ~ date, family = poisson, data = hits)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-5.0466  -1.5908  -0.3198   0.9128  10.6545  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.275e+01  8.130e-01  -40.28   <2e-16 ***
date         2.293e-03  5.266e-05   43.55   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 5150.0  on 730  degrees of freedom
Residual deviance: 3121.6  on 729  degrees of freedom
AIC: 6069.6

Number of Fisher Scoring iterations: 5


| You are really on a roll!

  |=====================================                                          |  47%
| Both coefficients are significant, being far more than two standard errors from zero.
| The Residual deviance is also very significantly less than the Null, indicating a
| strong effect. (Recall that the difference between Null and Residual deviance is
| approximately chi-square with 1 degree of freedom.) The Intercept coefficient, b0,
| just represents log average hits on R's Date 0, namely January 1, 1970. We will ignore
| it and focus on the coefficient of date, b1, since exp(b1) will estimate the
| percentage at which average visits increase per day of the site's life.

...

  |========================================                                       |  50%
| Get the 95% confidence interval for exp(b1) by exponentiating confint(mdl, 'date')

> exp(confint(mdl, 'date'))
Waiting for profiling to be done...
   2.5 %   97.5 % 
1.002192 1.002399 

| You are quite good my friend!

  |==========================================                                     |  53%
| Visits are estimated to increase by a factor of between 1.002192 and 1.002399 per day.
| That is, between 0.2192% and 0.2399% per day. This actually represents more than a
| doubling every year.

...

  |============================================                                   |  56%
| Our model looks like a pretty good description of the data, but no model is perfect
| and we can often learn about a data generation process by looking for a model's
| shortcomings. As shown in the figure, one thing about our model is 'zero inflation' in
| the first two weeks of January 2011, before the site had any visits. The model
| systematically overestimates the number of visits during this time. A less obvious
| thing is that the standard deviation of the data may be increasing with lambda faster
| than a Poisson model allows. This possibility can be seen in the rightmost plot by
| visually comparing the spread of green dots with the standard deviation predicted by
| the model (black dashes.) Also, there are four or five bursts of popularity during
| which the number of visits far exceeds two standard deviations over average. Perhaps
| these are due to mentions on another site.

...

  |===============================================                                |  59%
| It seems that at least some of them are. The simplystats column of our data records
| the number of visits to the Leek Group site which come from the related site, Simply
| Statistics. (I.e., visits due to clicks on a link to the Leek Group which appeared in
| a Simply Statisics post.)

...

  |=================================================                              |  62%
| In the figure, the maximum number of visits occurred in late 2012. Visits from the
| Simply Statistics blog were also at their maximum that day. To find the exact date we
| can use which.max(hits[,'visits']). Do this now.

> which.max(hits[,'visits'])
[1] 704

| Great job!

  |====================================================                           |  66%
| The maximum number of visits is recorded in row 704 of our data frame. Print that row
| by typing hits[704,].

> hits[704,]
          date visits simplystats
704 2012-12-04     94          64

| Nice work!

  |======================================================                         |  69%
| The maximum number of visits, 94, occurred on December 4, 2012, of which 64 came from
| the Simply Statistics blog. We might consider the 64 visits to be a special event,
| over and above normal. Can the difference, 94-64=30 visits, be attributed to normal
| traffic as estimated by our model? To check, we will need the value of lambda on
| December 4, 2012. This will be entry 704 of the fitted.values element of our model.
| Extract mdl$fitted.values[704] and store it in a variable named lambda.

> lambda <- mdl$fitted.values[704]

| You are amazing!

  |=========================================================                      |  72%
| The number of visits explained by our model on December 4, 2012 are those of a Poisson
| random variable with mean lambda. We can find the 95th percentile of this distribution
| using qpois(.95, lambda). Try this now.

> qpois(.95, lambda)
[1] 33

| You are doing so well!

  |===========================================================                    |  75%
| So, 95% of the time we would see 33 or fewer visits, hence 30 visits would not be rare
| according to our model. It would seem that on December 4, 2012, the very high number
| of visits was due to references from Simply Statistics. To gauge the importance of
| references from Simply Statistics we may wish to model the proportion of traffic such
| references represent. Doing so will also illustrate the use of glm's parameter,
| offset, to model frequencies and proportions.

...

  |==============================================================                 |  78%
| A Poisson process generates counts, and counts are whole numbers, 0, 1, 2, 3, etc. A
| proportion is a fraction. So how can a Poisson process model a proportion? The trick
| is to include the denominator of the fraction, or more precisely its log, as an
| offset. Recall that in our data set, 'simplystats' is the visits from Simply
| Statistics, and 'visits' is the total number of visits. We would like to model the
| fraction simplystats/visits, but to avoid division by zero we'll actually use
| simplystats/(visits+1). A Poisson model assumes that log(lambda) is a linear
| combination of predictors. Suppose we assume that log(lambda) = log(visits+1) + b0 +
| b1*date. In other words, if we insist that the coefficient of log(visits+1) be equal
| to 1, we are predicting the log of mean visits from Simply Statistics as a proportion
| of total visits: log(lambda/(visits+1)) = b0 + b1*date.

...

  |================================================================               |  81%
| glm's parameter, offset, has precisely this effect. It fixes the coefficient of the
| offset to 1. To create a model for the proportion of visits from Simply Statistics, we
| let offset=log(visits+1). Create such a Poisson model now and store it as a variable
| called mdl2.

> mdl2 <- glm(simplystats/(visits+1) ~ date, poisson, hits)
There were 50 or more warnings (use warnings() to see the first 50)

| One more time. You can do it! Or, type info() for more options.

| Enter mdl2 <- glm(formula = simplystats ~ date, family = poisson, data = hits, offset
| = log(visits + 1)), or something equivalent.

warning messages from top-level task callback 'mini'
There were 50 or more warnings (use warnings() to see the first 50)
> mdl2 <- glm(formula = simplystats ~ date, family = poisson, data = hits, offset = log(visits + 1))

| That's the answer I was looking for.

  |===================================================================            |  84%
| Although summary(mdl2) will show that the estimated coefficients are significantly
| different than zero, the model is actually not impressive. We can illustrate why by
| looking at December 4, 2012, once again. On that day there were 64 actual visits from
| Simply Statistics. However, according to mdl2, 64 visits would be extremely unlikely.
| You can verify this weakness in the model by finding mdl2's 95th percentile for that
| day. Recalling that December 4, 2012 was sample 704, find qpois(.95,
| mdl2$fitted.values[704]).

> qpois(.95, mdl2$fitted.values[704])
[1] 47

| Nice work!

  |=====================================================================          |  88%
| A Poisson distribution with lambda=1000 will be well approximated by a normal
| distribution. What will be the variance of that normal distribution?

1: lambda
2: the square root of lambda.
3: lambda squared

Selection: 1

| Keep working like that and you'll get there!

  |========================================================================       |  91%
| When modeling count outcomes as a Poisson process, what is modeled as a linear
| combination of the predictors?

1: The log of the mean
2: The mean
3: The counts

Selection: 1

| Your dedication is inspiring!

  |==========================================================================     |  94%
| What parameter of the glm function allows you to include a predictor whose coefficient
| is fixed to the value 1?

1: family
2: data
3: b0
4: formula
5: offset

Selection: 5

| Your dedication is inspiring!

  |=============================================================================  |  97%
| That completes the Poisson GLM example. Thanks for sticking with it. I hope we've made
| it count.

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

| You are quite good my friend!

| You've reached the end of this lesson! Returning to the main menu...

| Would you like to continue with one of these lessons?

1: R Programming Basic Building Blocks
2: The R Programming Environment Data Manipulation
3: No. Let me start something new.

Selection: 3

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Regression Models
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 0

| Leaving swirl now. Type swirl() to resume.

> 

最後の課である Lesson 13 の終了です。

4. Statistical Inference: The basics of statistical inference in R

Advanced

This intermediate to advanced level course closely follows the Statistical Inference course of the Johns Hopkins Data Science Specialization on Coursera. It introduces the student to basic concepts of statistical inference including probability, hypothesis testing, confidence intervals and p-values. It concludes with an initiation to topics of particular relevance to big data, issues of multiple testing and resampling.

5. Exploratory Data Analysis: The basics of exploring data in R

6. Other Courses

6.1 The R Programming Environment

6.4.1 この Swirl について

Data Science の Moocs(特に、coursera)で提供している、Johns Hopkins University の Roger Peng が Author の、swirl です。Swirl の Default の、R Programming の、改善版として、Coursera のコースで、利用することを目的に、作成されたようです。基本情報は以下の通りです。

特徴は、まず、第一に、Coursera のコースの教科書が Online で提供されていることです。
さらに、R Programming は、Base R つまり、R に最初から Install されている、基本命令だけで、構成されていますが、このコースでは、最近、標準となりつつある、tidyverse Package のいくつかの紹介も行っています。
R Programming の多くの部分を含んでいます。また、R Programming で、初心者には、少し難しい部分を少しあとに回したり、省略したりしています。ひとつには、tidyverse Package では、他の扱い方ができるからもあるかもしれません。省略されているのは、2.9, 2.10, 2.11, 2.13, 2.14, 2.15 です。   Lesson 12 は、tidyverse Package、特に、dplyr と、tidyr について、少し、学んでからでないと、無理だと思います。難しい箇所は、skip() で、答えを見ていってもよいでしょう。それ以外は、R Programming よりも、整備されているように、思います。

Contents(内容)

  1. Setting Up Swirl: Coursera のコースとの関係について(成績として使う、あとのために Code を取得、不要の中から選択。)
  2. Basic Building Blocks: R Programming 2.1 と同じ
  3. Sequences of Numbers: R Programming 2.3 と同じ
  4. Vectors: R Programming 2.4 と同じ
  5. Missing Values: R Programming 2.5 と同じ
  6. Subsetting Vectors: R Programming 2.6 と同じ
  7. Matrices and Data Frames: R Programming 2.7 と同じ
  8. Logic: R Programming 2.8 と同じ
  9. Workspace and Files: R Programming 2.2 と同じ
  10. Reading Tabular Data: NEW
  11. Looking at Data: R Programming 2.12 と同じ
  12. Data Manipulation: NEW
  13. Text Manipulation Functions: NEW
  14. Regular Expressions: NEW
  15. The stringr Package: NEW

Note(注)

Lesson 12: Data Manupulation に、Bug があるようです。最後の、部分(88% のあたり)で、skip() もできなくなります。それ以前の部分も、正解でも、正解にならない部分があるようです。ただ、それは、入力をし直すと、正解になりますから、なにか、control character などの問題かもしれません。 Bug については、下を、参照してください。

以下、つまづきそうな箇所のみコメントを加えます。

6.4.1 Install から Lesson 1: Setting Up Swirl

最初には、swirl を使うよという宣言 library(swirl) です。swirl が読み込まれます。

> library(swirl)

| Hi! I see that you have some variables saved in your workspace. To keep things running
| smoothly, I recommend you clean up before starting swirl.

| Type ls() to see a list of the variables in your workspace. Then, type rm(list=ls()) to
| clear your workspace.

| Type swirl() when you are ready to begin.

次の、コマンドで、この swirl を Install します。

> swirl::install_course("The R Programming Environment")
  |=================================================================================| 100%

| Course installed successfully!

swirl のスタートは、常に、この命令 swirl() です。

> swirl()

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you
| did then. If you are new, call yourself something unique.

What shall I call you? SL

コースを選択します。3 がこのコース、2 は、最初に紹介している、swirl の基本なコース、3 は、このあとに紹介してあります。

> swirl()

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you
| did then. If you are new, call yourself something unique.

What shall I call you? SL

| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 3

| Please choose a lesson, or type 0 to return to course menu.

 1: Setting Up Swirl              2: Basic Building Blocks      
 3: Sequences of Numbers          4: Vectors                    
 5: Missing Values                6: Subsetting Vectors         
 7: Matrices and Data Frames      8: Logic                      
 9: Workspace and Files          10: Reading Tabular Data       
11: Looking at Data              12: Data Manipulation          
13: Text Manipulation Functions  14: Regular Expressions        
15: The stringr Package          

Selection: 1

ここから、Lesson 1 が始まります。基本的に、Coursera との関係です。Lesson 1 はとばしてもかまいません。

  |                                                                                 |   0%

| Welcome to the swirl component of The R Programming Environment course on Coursera. This
| will be a very short lesson designed to test swirl's connection to Coursera's servers.

...

  |====================                                                             |  25%
| At the end of every lesson you will be given a choice to submit to Coursera or to have a
| code generated. If this test succeeds then you should continue to submit your progress
| to Coursera. However if this test fails you should always choose to have a code
| generated.

...

  |========================================                                         |  50%
| Copy the generated code and then paste it into the quiz question on Coursera that
| corresponds to the lesson you're working on in swirl in order to get credit.

...

  |=============================================================                    |  75%
| Let's test submitting to Coursera now. In the next question choose Yes. If you are not
| able to connect, then you know that you should always ask for a code at the end of the
| lesson. If you are able to connect then you should be able to submit your progress to
| swirl easily in the future!

...

  |=================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Generate Code
2: No
3: Yes

Coursera course の課題としてする場合を除いて、2 でよいと思います。ためしに、1 を選んでみました。

Selection: 1
Submit the following code as the answer
to a quiz question on Coursera.

#########################

_code is given_

#########################

| You are really on a roll!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 1 の終了です。

6.4.2 Lesson 2: Basic Building Blocks

R Programming 2.1 と同じです。

| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 3

| Please choose a lesson, or type 0 to return to course menu.

 1: Setting Up Swirl              2: Basic Building Blocks      
 3: Sequences of Numbers          4: Vectors                    
 5: Missing Values                6: Subsetting Vectors         
 7: Matrices and Data Frames      8: Logic                      
 9: Workspace and Files          10: Reading Tabular Data       
11: Looking at Data              12: Data Manipulation          
13: Text Manipulation Functions  14: Regular Expressions        
15: The stringr Package          

Selection: 2

  |                                                                                 |   0%

| In this lesson, we will explore some basic building blocks of the R programming
| language.

...

  |==                                                                               |   3%
| If at any point you'd like more information on a particular topic related to R, you can
| type help.start() at the prompt, which will open a menu of resources (either within
| RStudio or your default web browser, depending on your setup). Alternatively, a simple
| web search often yields the answer you're looking for.

...

  |====                                                                             |   5%
| In its simplest form, R can be used as an interactive calculator. Type 5 + 7 and press
| Enter.

> 5 + 7
[1] 12

| All that practice is paying off!

  |======                                                                           |   8%
| R simply prints the result of 12 by default. However, R is a programming language and
| often the reason we use a programming language as opposed to a calculator is to automate
| some process or avoid unnecessary repetition.

...

  |=========                                                                        |  11%
| In this case, we may want to use our result from above in a second calculation. Instead
| of retyping 5 + 7 every time we need it, we can just create a new variable that stores
| the result.

...

  |===========                                                                      |  13%
| The way you assign a value to a variable in R is by using the assignment operator, which
| is just a 'less than' symbol followed by a 'minus' sign. It looks like this: <-

...

  |=============                                                                    |  16%
| Think of the assignment operator as an arrow. You are assigning the value on the right
| side of the arrow to the variable name on the left side of the arrow.

...

  |===============                                                                  |  18%
| To assign the result of 5 + 7 to a new variable called x, you type x <- 5 + 7. This can
| be read as 'x gets 5 plus 7'. Give it a try now.

> x <- 5+7

| You nailed it! Good job!

  |=================                                                                |  21%
| You'll notice that R did not print the result of 12 this time. When you use the
| assignment operator, R assumes that you don't want to see the result immediately, but
| rather that you intend to use the result for something else later on.

...

  |===================                                                              |  24%
| To view the contents of the variable x, just type x and press Enter. Try it now.

> x
[1] 12

| You got it right!

  |=====================                                                            |  26%
| Now, store the result of x - 3 in a new variable called y.

> y <- x - 3

| Nice work!

  |=======================                                                          |  29%
| What is the value of y? Type y to find out.

> y
[1] 9

| All that hard work is paying off!

  |==========================                                                       |  32%
| Now, let's create a small collection of numbers called a vector. Any object that
| contains data is called a data structure and numeric vectors are the simplest type of
| data structure in R. In fact, even a single number is considered a vector of length one.

...

  |============================                                                     |  34%
| The easiest way to create a vector is with the c() function, which stands for
| 'concatenate' or 'combine'. To create a vector containing the numbers 1.1, 9, and 3.14,
| type c(1.1, 9, 3.14). Try it now and store the result in a variable called z.

> z <- c(1.1, 9, 3.14)

| You nailed it! Good job!

  |==============================                                                   |  37%
| Anytime you have questions about a particular function, you can access R's built-in help
| files via the `?` command. For example, if you want more information on the c()
| function, type ?c without the parentheses that normally follow a function name. Give it
| a try.

> ?c

| You nailed it! Good job!

  |================================                                                 |  39%
| Type z to view its contents. Notice that there are no commas separating the values in
| the output.

> z
[1] 1.10 9.00 3.14

| You are really on a roll!

  |==================================                                               |  42%
| You can combine vectors to make a new vector. Create a new vector that contains z, 555,
| then z again in that order. Don't assign this vector to a new variable, so that we can
| just see the result immediately.

> c(z, 555, z)
[1]   1.10   9.00   3.14 555.00   1.10   9.00   3.14

| You are doing so well!

  |====================================                                             |  45%
| Numeric vectors can be used in arithmetic expressions. Type the following to see what
| happens: z * 2 + 100.

> z * 2 + 100
[1] 102.20 118.00 106.28

| You got it right!

  |======================================                                           |  47%
| First, R multiplied each of the three elements in z by 2. Then it added 100 to each
| element to get the result you see above.

...

  |========================================                                         |  50%
| Other common arithmetic operators are `+`, `-`, `/`, and `^` (where x^2 means 'x
| squared'). To take the square root, use the sqrt() function and to take the absolute
| value, use the abs() function.

...

  |===========================================                                      |  53%
| Take the square root of z - 1 and assign it to a new variable called my_sqrt.

> my_sqrt <- sqrt(z - 1)

| That's a job well done!

  |=============================================                                    |  55%
| Before we view the contents of the my_sqrt variable, what do you think it contains?

1: a vector of length 3
2: a single number (i.e a vector of length 1)
3: a vector of length 0 (i.e. an empty vector)

Selection: 1

| Nice work!

  |===============================================                                  |  58%
| Print the contents of my_sqrt.

> my_sqrt
[1] 0.3162278 2.8284271 1.4628739

| You are doing so well!

  |=================================================                                |  61%
| As you may have guessed, R first subtracted 1 from each element of z, then took the
| square root of each element. This leaves you with a vector of the same length as the
| original vector z.

...

  |===================================================                              |  63%
| Now, create a new variable called my_div that gets the value of z divided by my_sqrt.

> my_div <- z / my_sqrt

| That's correct!

  |=====================================================                            |  66%
| Which statement do you think is true?

1: my_div is undefined
2: The first element of my_div is equal to the first element of z divided by the first element of my_sqrt, and so on...
3: my_div is a single number (i.e a vector of length 1)

Selection: 2

| All that hard work is paying off!

  |=======================================================                          |  68%
| Go ahead and print the contents of my_div.

> my_div
[1] 3.478505 3.181981 2.146460

| Perseverance, that's the answer.

  |==========================================================                       |  71%
| When given two vectors of the same length, R simply performs the specified arithmetic
| operation (`+`, `-`, `*`, etc.) element-by-element. If the vectors are of different
| lengths, R 'recycles' the shorter vector until it is the same length as the longer
| vector.

...

  |============================================================                     |  74%
| When we did z * 2 + 100 in our earlier example, z was a vector of length 3, but
| technically 2 and 100 are each vectors of length 1.

...

  |==============================================================                   |  76%
| Behind the scenes, R is 'recycling' the 2 to make a vector of 2s and the 100 to make a
| vector of 100s. In other words, when you ask R to compute z * 2 + 100, what it really
| computes is this: z * c(2, 2, 2) + c(100, 100, 100).

...

  |================================================================                 |  79%
| To see another example of how this vector 'recycling' works, try adding c(1, 2, 3, 4)
| and c(0, 10). Don't worry about saving the result in a new variable.

> c(1, 2, 3, 4) + c(0, 10)
[1]  1 12  3 14

| You got it right!

  |==================================================================               |  82%
| If the length of the shorter vector does not divide evenly into the length of the longer
| vector, R will still apply the 'recycling' method, but will throw a warning to let you
| know something fishy might be going on.

...

  |====================================================================             |  84%
| Try c(1, 2, 3, 4) + c(0, 10, 100) for an example.

> c(1, 2, 3, 4) + c(0, 10, 100)
[1]   1  12 103   4
 警告メッセージ: 
 c(1, 2, 3, 4) + c(0, 10, 100) で: 
   長いオブジェクトの長さが短いオブジェクトの長さの倍数になっていません 

| All that hard work is paying off!

  |======================================================================           |  87%
| Before concluding this lesson, I'd like to show you a couple of time-saving tricks.

...

  |========================================================================         |  89%
| Earlier in the lesson, you computed z * 2 + 100. Let's pretend that you made a mistake
| and that you meant to add 1000 instead of 100. You could either re-type the expression,
| or...

...

  |===========================================================================      |  92%
| In many programming environments, the up arrow will cycle through previous commands. Try
| hitting the up arrow on your keyboard until you get to this command (z * 2 + 100), then
| change 100 to 1000 and hit Enter. If the up arrow doesn't work for you, just type the
| corrected command.

> z * 2 + 1000
[1] 1002.20 1018.00 1006.28

| That's the answer I was looking for.

  |=============================================================================    |  95%
| Finally, let's pretend you'd like to view the contents of a variable that you created
| earlier, but you can't seem to remember if you named it my_div or myDiv. You could try
| both and see what works, or...

...

  |===============================================================================  |  97%
| You can type the first two letters of the variable name, then hit the Tab key (possibly
| more than once). Most programming environments will provide a list of variables that
| you've created that begin with 'my'. This is called auto-completion and can be quite
| handy when you have many variables in your workspace. Give it a try. (If auto-completion
| doesn't work for you, just type my_div and press Enter.)

> my_div
[1] 3.478505 3.181981 2.146460

| Keep up the great work!

  |=================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: No
3: Generate Code

Selection: 3
Submit the following code as the answer
to a quiz question on Coursera.

#########################

_code is given_

#########################

| Nice work!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 2 終了。

6.4.3 Lesson 3: Sequences of Numbers

R Programming 2.3 と同じ


| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 3

| Please choose a lesson, or type 0 to return to course menu.

 1: Setting Up Swirl              2: Basic Building Blocks      
 3: Sequences of Numbers          4: Vectors                    
 5: Missing Values                6: Subsetting Vectors         
 7: Matrices and Data Frames      8: Logic                      
 9: Workspace and Files          10: Reading Tabular Data       
11: Looking at Data              12: Data Manipulation          
13: Text Manipulation Functions  14: Regular Expressions        
15: The stringr Package          

Selection: 3

  |                                                                                 |   0%

| In this lesson, you'll learn how to create sequences of numbers in R.

...

  |====                                                                             |   4%
| The simplest way to create a sequence of numbers in R is by using the `:` operator. Type
| 1:20 to see how it works.

> 1:20
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

| That's a job well done!

  |=======                                                                          |   9%
| That gave us every integer between (and including) 1 and 20. We could also use it to
| create a sequence of real numbers. For example, try pi:10.

> pi:10
[1] 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 9.141593

| Your dedication is inspiring!

  |===========                                                                      |  13%
| The result is a vector of real numbers starting with pi (3.142...) and increasing in
| increments of 1. The upper limit of 10 is never reached, since the next number in our
| sequence would be greater than 10.

...

  |==============                                                                   |  17%
| What happens if we do 15:1? Give it a try to find out.

> 15:1
 [1] 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1

| You're the best!

  |==================                                                               |  22%
| It counted backwards in increments of 1! It's unlikely we'd want this behavior, but
| nonetheless it's good to know how it could happen.

...

  |=====================                                                            |  26%
| Remember that if you have questions about a particular R function, you can access its
| documentation with a question mark followed by the function name: ?function_name_here.
| However, in the case of an operator like the colon used above, you must enclose the
| symbol in backticks like this: ?`:`. (NOTE: The backtick (`) key is generally located in
| the top left corner of a keyboard, above the Tab key. If you don't have a backtick key,
| you can use regular quotes.)

...

  |=========================                                                        |  30%
| Pull up the documentation for `:` now.

> ?`:`

| That's a job well done!

  |============================                                                     |  35%
| Often, we'll desire more control over a sequence we're creating than what the `:`
| operator gives us. The seq() function serves this purpose.

...

  |================================                                                 |  39%
| The most basic use of seq() does exactly the same thing as the `:` operator. Try seq(1,
| 20) to see this.

> seq(1,20)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

| Excellent work!

  |===================================                                              |  43%
| This gives us the same output as 1:20. However, let's say that instead we want a vector
| of numbers ranging from 0 to 10, incremented by 0.5. seq(0, 10, by=0.5) does just that.
| Try it out.

> seq(0, 10, by=0.5)
 [1]  0.0  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0
[18]  8.5  9.0  9.5 10.0

| You are amazing!

  |=======================================                                          |  48%
| Or maybe we don't care what the increment is and we just want a sequence of 30 numbers
| between 5 and 10. seq(5, 10, length=30) does the trick. Give it a shot now and store the
| result in a new variable called my_seq.

> my_seq <- seq(5, 10, length=30)

| All that practice is paying off!

  |==========================================                                       |  52%
| To confirm that my_seq has length 30, we can use the length() function. Try it now.

> length(my_seq)
[1] 30

| Excellent job!

  |==============================================                                   |  57%
| Let's pretend we don't know the length of my_seq, but we want to generate a sequence of
| integers from 1 to N, where N represents the length of the my_seq vector. In other
| words, we want a new vector (1, 2, 3, ...) that is the same length as my_seq.

...

  |=================================================                                |  61%
| There are several ways we could do this. One possibility is to combine the `:` operator
| and the length() function like this: 1:length(my_seq). Give that a try.

> 1:length(my_seq)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
[30] 30

| That's a job well done!

  |=====================================================                            |  65%
| Another option is to use seq(along.with = my_seq). Give that a try.

> seq(along.with = my_seq)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
[30] 30

| Great job!

  |========================================================                         |  70%
| However, as is the case with many common tasks, R has a separate built-in function for
| this purpose called seq_along(). Type seq_along(my_seq) to see it in action.

> seq_along(my_seq)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
[30] 30

| That's the answer I was looking for.

  |============================================================                     |  74%
| There are often several approaches to solving the same problem, particularly in R.
| Simple approaches that involve less typing are generally best. It's also important for
| your code to be readable, so that you and others can figure out what's going on without
| too much hassle.

...

  |===============================================================                  |  78%
| If R has a built-in function for a particular task, it's likely that function is highly
| optimized for that purpose and is your best option. As you become a more advanced R
| programmer, you'll design your own functions to perform tasks when there are no better
| options. We'll explore writing your own functions in future lessons.

...

  |===================================================================              |  83%
| One more function related to creating sequences of numbers is rep(), which stands for
| 'replicate'. Let's look at a few uses.

...

  |======================================================================           |  87%
| If we're interested in creating a vector that contains 40 zeros, we can use rep(0, times
| = 40). Try it out.

> rep(0, times = 40)
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

| You are amazing!

  |==========================================================================       |  91%
| If instead we want our vector to contain 10 repetitions of the vector (0, 1, 2), we can
| do rep(c(0, 1, 2), times = 10). Go ahead.

> rep(c(0, 1, 2), times = 10)
 [1] 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2

| You are really on a roll!

  |=============================================================================    |  96%
| Finally, let's say that rather than repeating the vector (0, 1, 2) over and over again,
| we want our vector to contain 10 zeros, then 10 ones, then 10 twos. We can do this with
| the `each` argument. Try rep(c(0, 1, 2), each = 10).

> rep(c(0, 1, 2), each = 10)
 [1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

| You are amazing!

  |=================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: No
3: Generate Code

Selection: 2

| Excellent work!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 3 終了。

6.4.4 Lesson 4: Vectors

R Programming 2.4 と同じ


| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 3

| Please choose a lesson, or type 0 to return to course menu.

 1: Setting Up Swirl              2: Basic Building Blocks      
 3: Sequences of Numbers          4: Vectors                    
 5: Missing Values                6: Subsetting Vectors         
 7: Matrices and Data Frames      8: Logic                      
 9: Workspace and Files          10: Reading Tabular Data       
11: Looking at Data              12: Data Manipulation          
13: Text Manipulation Functions  14: Regular Expressions        
15: The stringr Package          

Selection: 4

  |                                                                                 |   0%

| The simplest and most common data structure in R is the vector.

...

  |==                                                                               |   3%
| Vectors come in two different flavors: atomic vectors and lists. An atomic vector
| contains exactly one data type, whereas a list may contain multiple data types. We'll
| explore atomic vectors further before we get to lists.

...

  |====                                                                             |   5%
| In previous lessons, we dealt entirely with numeric vectors, which are one type of
| atomic vector. Other types of atomic vectors include logical, character, integer, and
| complex. In this lesson, we'll take a closer look at logical and character vectors.

...

  |======                                                                           |   8%
| Logical vectors can contain the values TRUE, FALSE, and NA (for 'not available'). These
| values are generated as the result of logical 'conditions'. Let's experiment with some
| simple conditions.

...

  |=========                                                                        |  11%
| First, create a numeric vector num_vect that contains the values 0.5, 55, -10, and 6.

> num_vect <- c(0.5, 55, -10, 6)

| You are doing so well!

  |===========                                                                      |  13%
| Now, create a variable called tf that gets the result of num_vect < 1, which is read as
| 'num_vect is less than 1'.

> tf <- num_vect < 1

| You're the best!

  |=============                                                                    |  16%
| What do you think tf will look like?

1: a vector of 4 logical values
2: a single logical value

Selection: 1

| You're the best!

  |===============                                                                  |  18%
| Print the contents of tf now.

> tf
[1]  TRUE FALSE  TRUE FALSE

| Excellent job!

  |=================                                                                |  21%
| The statement num_vect < 1 is a condition and tf tells us whether each corresponding
| element of our numeric vector num_vect satisfies this condition.

...

  |===================                                                              |  24%
| The first element of num_vect is 0.5, which is less than 1 and therefore the statement
| 0.5 < 1 is TRUE. The second element of num_vect is 55, which is greater than 1, so the
| statement 55 < 1 is FALSE. The same logic applies for the third and fourth elements.

...

  |=====================                                                            |  26%
| Let's try another. Type num_vect >= 6 without assigning the result to a new variable.

> num_vect >= 6
[1] FALSE  TRUE FALSE  TRUE

| Great job!

  |=======================                                                          |  29%
| This time, we are asking whether each individual element of num_vect is greater than OR
| equal to 6. Since only 55 and 6 are greater than or equal to 6, the second and fourth
| elements of the result are TRUE and the first and third elements are FALSE.

...

  |==========================                                                       |  32%
| The `<` and `>=` symbols in these examples are called 'logical operators'. Other logical
| operators include `>`, `<=`, `==` for exact equality, and `!=` for inequality.

...

  |============================                                                     |  34%
| If we have two logical expressions, A and B, we can ask whether at least one is TRUE
| with A | B (logical 'or' a.k.a. 'union') or whether they are both TRUE with A & B
| (logical 'and' a.k.a. 'intersection'). Lastly, !A is the negation of A and is TRUE when
| A is FALSE and vice versa.

...

  |==============================                                                   |  37%
| It's a good idea to spend some time playing around with various combinations of these
| logical operators until you get comfortable with their use. We'll do a few examples here
| to get you started.

...

  |================================                                                 |  39%
| Try your best to predict the result of each of the following statements. You can use
| pencil and paper to work them out if it's helpful. If you get stuck, just guess and
| you've got a 50% chance of getting the right answer!

...

  |==================================                                               |  42%
| (3 > 5) & (4 == 4)

1: TRUE
2: FALSE

Selection: 2

| That's a job well done!

  |====================================                                             |  45%
| (TRUE == TRUE) | (TRUE == FALSE)

1: FALSE
2: TRUE

Selection: 2

| Your dedication is inspiring!

  |======================================                                           |  47%
| ((111 >= 111) | !(TRUE)) & ((4 + 1) == 5)

1: FALSE
2: TRUE

Selection: 2

| You are really on a roll!

  |========================================                                         |  50%
| Don't worry if you found these to be tricky. They're supposed to be. Working with
| logical statements in R takes practice, but your efforts will be rewarded in future
| lessons (e.g. subsetting and control structures).

...

  |===========================================                                      |  53%
| Character vectors are also very common in R. Double quotes are used to distinguish
| character objects, as in the following example.

...

  |=============================================                                    |  55%
| Create a character vector that contains the following words: "My", "name", "is".
| Remember to enclose each word in its own set of double quotes, so that R knows they are
| character strings. Store the vector in a variable called my_char.

> my_char <- c("My", "name", "is")

| You are amazing!

  |===============================================                                  |  58%
| Print the contents of my_char to see what it looks like.

> my_char
[1] "My"   "name" "is"  

| Excellent job!

  |=================================================                                |  61%
| Right now, my_char is a character vector of length 3. Let's say we want to join the
| elements of my_char together into one continuous character string (i.e. a character
| vector of length 1). We can do this using the paste() function.

...

  |===================================================                              |  63%
| Type paste(my_char, collapse = " ") now. Make sure there's a space between the double
| quotes in the `collapse` argument. You'll see why in a second.

> paste(my_char, collapse = " ")
[1] "My name is"

| You're the best!

  |=====================================================                            |  66%
| The `collapse` argument to the paste() function tells R that when we join together the
| elements of the my_char character vector, we'd like to separate them with single spaces.

...

  |=======================================================                          |  68%
| It seems that we're missing something.... Ah, yes! Your name!

...

  |==========================================================                       |  71%
| To add (or 'concatenate') your name to the end of my_char, use the c() function like
| this: c(my_char, "your_name_here"). Place your name in double quotes where I've put
| "your_name_here". Try it now, storing the result in a new variable called my_name.

> my_name <- c(my_char, "SL")

| You are quite good my friend!

  |============================================================                     |  74%
| Take a look at the contents of my_name.

> my_name
[1] "My"   "name" "is"   "SL"  

| Excellent job!

  |==============================================================                   |  76%
| Now, use the paste() function once more to join the words in my_name together into a
| single character string. Don't forget to say collapse = " "!

> paste(my_name, collapse = " ")
[1] "My name is SL"

| You are amazing!

  |================================================================                 |  79%
| In this example, we used the paste() function to collapse the elements of a single
| character vector. paste() can also be used to join the elements of multiple character
| vectors.

...

  |==================================================================               |  82%
| In the simplest case, we can join two character vectors that are each of length 1 (i.e.
| join two words). Try paste("Hello", "world!", sep = " "), where the `sep` argument tells
| R that we want to separate the joined elements with a single space.

> paste("Hello", "world!", sep = " ")
[1] "Hello world!"

| Excellent job!

  |====================================================================             |  84%
| For a slightly more complicated example, we can join two vectors, each of length 3. Use
| paste() to join the integer vector 1:3 with the character vector c("X", "Y", "Z"). This
| time, use sep = "" to leave no space between the joined elements.

> paste(1:3, c("X", "Y", "Z"), sep = "")
[1] "1X" "2Y" "3Z"

| Keep up the great work!

  |======================================================================           |  87%
| What do you think will happen if our vectors are of different length? (Hint: we talked
| about this in a previous lesson.)

...

  |========================================================================         |  89%
| Vector recycling! Try paste(LETTERS, 1:4, sep = "-"), where LETTERS is a predefined
| variable in R containing a character vector of all 26 letters in the English alphabet.

> paste(LETTERS, 1:4, sep = "-")
 [1] "A-1" "B-2" "C-3" "D-4" "E-1" "F-2" "G-3" "H-4" "I-1" "J-2" "K-3" "L-4" "M-1" "N-2"
[15] "O-3" "P-4" "Q-1" "R-2" "S-3" "T-4" "U-1" "V-2" "W-3" "X-4" "Y-1" "Z-2"

| You are amazing!

  |===========================================================================      |  92%
| Since the character vector LETTERS is longer than the numeric vector 1:4, R simply
| recycles, or repeats, 1:4 until it matches the length of LETTERS.

...

  |=============================================================================    |  95%
| Also worth noting is that the numeric vector 1:4 gets 'coerced' into a character vector
| by the paste() function.

...

  |===============================================================================  |  97%
| We'll discuss coercion in another lesson, but all it really means is that the numbers 1,
| 2, 3, and 4 in the output above are no longer numbers to R, but rather characters "1",
| "2", "3", and "4".

...

  |=================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Generate Code
2: Yes
3: No

Selection: 3

| You are amazing!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 4 終了。

6.4.5 Lesson 5: Missing Values

R Programming 2.5 と同じ


| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 3

| Please choose a lesson, or type 0 to return to course menu.

 1: Setting Up Swirl              2: Basic Building Blocks      
 3: Sequences of Numbers          4: Vectors                    
 5: Missing Values                6: Subsetting Vectors         
 7: Matrices and Data Frames      8: Logic                      
 9: Workspace and Files          10: Reading Tabular Data       
11: Looking at Data              12: Data Manipulation          
13: Text Manipulation Functions  14: Regular Expressions        
15: The stringr Package          

Selection: 5

  |                                                                                 |   0%

| Missing values play an important role in statistics and data analysis. Often, missing
| values must not be ignored, but rather they should be carefully studied to see if
| there's an underlying pattern or cause for their missingness.

...

  |====                                                                             |   5%
| In R, NA is used to represent any value that is 'not available' or 'missing' (in the
| statistical sense). In this lesson, we'll explore missing values further.

...

  |========                                                                         |  10%
| Any operation involving NA generally yields NA as the result. To illustrate, let's
| create a vector c(44, NA, 5, NA) and assign it to a variable x.

> x <- c(44, NA, 5, NA)

| You nailed it! Good job!

  |============                                                                     |  15%
| Now, let's multiply x by 3.

> x * 3
[1] 132  NA  15  NA

| You nailed it! Good job!

  |================                                                                 |  20%
| Notice that the elements of the resulting vector that correspond with the NA values in x
| are also NA.

...

  |====================                                                             |  25%
| To make things a little more interesting, lets create a vector containing 1000 draws
| from a standard normal distribution with y <- rnorm(1000).

> y <- rnorm(1000)

| You are doing so well!

  |========================                                                         |  30%
| Next, let's create a vector containing 1000 NAs with z <- rep(NA, 1000).

> z <- rep(NA, 1000)

| That's a job well done!

  |============================                                                     |  35%
| Finally, let's select 100 elements at random from these 2000 values (combining y and z)
| such that we don't know how many NAs we'll wind up with or what positions they'll occupy
| in our final vector -- my_data <- sample(c(y, z), 100).

> my_data <- sample(c(y, z), 100)

| Nice work!

  |================================                                                 |  40%
| Let's first ask the question of where our NAs are located in our data. The is.na()
| function tells us whether each element of a vector is NA. Call is.na() on my_data and
| assign the result to my_na.

> my_na <- is.na(my_data)

| You are amazing!

  |====================================                                             |  45%
| Now, print my_na to see what you came up with.

> my_na
  [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE
 [15] FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
 [29]  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE
 [43]  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE
 [57] FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE
 [71]  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE
 [85]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE
 [99]  TRUE FALSE

| Keep working like that and you'll get there!

  |========================================                                         |  50%
| Everywhere you see a TRUE, you know the corresponding element of my_data is NA.
| Likewise, everywhere you see a FALSE, you know the corresponding element of my_data is
| one of our random draws from the standard normal distribution.

...

  |=============================================                                    |  55%
| In our previous discussion of logical operators, we introduced the `==` operator as a
| method of testing for equality between two objects. So, you might think the expression
| my_data == NA yields the same results as is.na(). Give it a try.

> my_data == NA
  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [29] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [57] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [85] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

| You got it right!

  |=================================================                                |  60%
| The reason you got a vector of all NAs is that NA is not really a value, but just a
| placeholder for a quantity that is not available. Therefore the logical expression is
| incomplete and R has no choice but to return a vector of the same length as my_data that
| contains all NAs.

...

  |=====================================================                            |  65%
| Don't worry if that's a little confusing. The key takeaway is to be cautious when using
| logical expressions anytime NAs might creep in, since a single NA value can derail the
| entire thing.

...

  |=========================================================                        |  70%
| So, back to the task at hand. Now that we have a vector, my_na, that has a TRUE for
| every NA and FALSE for every numeric value, we can compute the total number of NAs in
| our data.

...

  |=============================================================                    |  75%
| The trick is to recognize that underneath the surface, R represents TRUE as the number 1
| and FALSE as the number 0. Therefore, if we take the sum of a bunch of TRUEs and FALSEs,
| we get the total number of TRUEs.

...

  |=================================================================                |  80%
| Let's give that a try here. Call the sum() function on my_na to count the total number
| of TRUEs in my_na, and thus the total number of NAs in my_data. Don't assign the result
| to a new variable.

> sum(my_na)
[1] 49

| You got it!

  |=====================================================================            |  85%
| Pretty cool, huh? Finally, let's take a look at the data to convince ourselves that
| everything 'adds up'. Print my_data to the console.

> my_data
  [1] -0.46514213          NA -1.21438163          NA -1.93649998          NA  0.56570802
  [8]          NA  2.07186036  2.34138900          NA  1.09724479 -0.91687042          NA
 [15] -1.48199159          NA -0.44705334          NA          NA  0.41013612          NA
 [22]          NA  2.13887264 -0.35674959  0.51280638  0.08629471  0.23256676          NA
 [29]          NA  0.23422638          NA          NA -1.72401551 -0.27057749          NA
 [36]          NA -0.37880914          NA          NA  2.66249148  2.33597049 -0.67723715
 [43]          NA -1.09887583          NA          NA  0.23574652          NA          NA
 [50] -0.95032731          NA          NA -1.55835366 -0.61947455          NA -0.80682398
 [57]  2.88283893 -0.03274661          NA -1.61410260          NA -0.94264828          NA
 [64] -1.26093842  1.87245636  0.18340868          NA  0.59449314          NA  1.35854303
 [71]          NA -0.61278741          NA          NA -1.53552942          NA          NA
 [78] -0.34104983          NA          NA          NA  0.30262988  0.11371792 -0.38617500
 [85]          NA -0.84709085          NA  0.30991821          NA  0.87560578          NA
 [92]          NA  0.24376361 -1.45355569          NA          NA          NA -0.53921256
 [99]          NA -0.38670598

| You are doing so well!

  |=========================================================================        |  90%
| Now that we've got NAs down pat, let's look at a second type of missing value -- NaN,
| which stands for 'not a number'. To generate NaN, try dividing (using a forward slash) 0
| by 0 now.

> 0/0
[1] NaN

| You got it right!

  |=============================================================================    |  95%
| Let's do one more, just for fun. In R, Inf stands for infinity. What happens if you
| subtract Inf from Inf?

> Inf - Inf
[1] NaN

| You're the best!

  |=================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: Generate Code
3: No

Selection: 3

| All that hard work is paying off!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 5 終了。

6.4.6 Lesson 6: Subsetting Vectors

R Programming 2.6 と同じ


| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 3

| Please choose a lesson, or type 0 to return to course menu.

 1: Setting Up Swirl              2: Basic Building Blocks      
 3: Sequences of Numbers          4: Vectors                    
 5: Missing Values                6: Subsetting Vectors         
 7: Matrices and Data Frames      8: Logic                      
 9: Workspace and Files          10: Reading Tabular Data       
11: Looking at Data              12: Data Manipulation          
13: Text Manipulation Functions  14: Regular Expressions        
15: The stringr Package          

Selection: 6

  |                                                                                 |   0%

| In this lesson, we'll see how to extract elements from a vector based on some conditions
| that we specify.

...

  |==                                                                               |   3%
| For example, we may only be interested in the first 20 elements of a vector, or only the
| elements that are not NA, or only those that are positive or correspond to a specific
| variable of interest. By the end of this lesson, you'll know how to handle each of these
| scenarios.

...

  |====                                                                             |   5%
| I've created for you a vector called x that contains a random ordering of 20 numbers
| (from a standard normal distribution) and 20 NAs. Type x now to see what it looks like.

> x
 [1]  0.289935741           NA           NA  1.034760784           NA  0.666018013
 [7]           NA           NA           NA           NA           NA -0.129023011
[13]           NA           NA           NA  1.937374200 -0.042131245           NA
[19] -0.962464028           NA -0.033586181           NA -0.959176306           NA
[25]           NA -0.009029318  1.025973766 -1.355439558 -0.110210766           NA
[31]  0.224059162  1.070662244 -0.391553826  0.684961771 -1.457592981 -0.196851453
[37]           NA  0.538363030           NA           NA

| That's the answer I was looking for.

  |======                                                                           |   8%
| The way you tell R that you want to select some particular elements (i.e. a 'subset')
| from a vector is by placing an 'index vector' in square brackets immediately following
| the name of the vector.

...

  |========                                                                         |  10%
| For a simple example, try x[1:10] to view the first ten elements of x.

> x[1:10]
 [1] 0.2899357        NA        NA 1.0347608        NA 0.6660180        NA        NA
 [9]        NA        NA

| You are amazing!

  |==========                                                                       |  13%
| Index vectors come in four different flavors -- logical vectors, vectors of positive
| integers, vectors of negative integers, and vectors of character strings -- each of
| which we'll cover in this lesson.

...

  |============                                                                     |  15%
| Let's start by indexing with logical vectors. One common scenario when working with
| real-world data is that we want to extract all elements of a vector that are not NA
| (i.e. missing data). Recall that is.na(x) yields a vector of logical values the same
| length as x, with TRUEs corresponding to NA values in x and FALSEs corresponding to
| non-NA values in x.

...

  |===============                                                                  |  18%
| What do you think x[is.na(x)] will give you?

1: A vector of length 0
2: A vector of all NAs
3: A vector of TRUEs and FALSEs
4: A vector with no NAs

Selection: 2

| You are quite good my friend!

  |=================                                                                |  21%
| Prove it to yourself by typing x[is.na(x)].

> x[is.na(x)]
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

| You are doing so well!

  |===================                                                              |  23%
| Recall that `!` gives us the negation of a logical expression, so !is.na(x) can be read
| as 'is not NA'. Therefore, if we want to create a vector called y that contains all of
| the non-NA values from x, we can use y <- x[!is.na(x)]. Give it a try.

> y <- x[!is.na(x)]

| Excellent job!

  |=====================                                                            |  26%
| Print y to the console.

> y
 [1]  0.289935741  1.034760784  0.666018013 -0.129023011  1.937374200 -0.042131245
 [7] -0.962464028 -0.033586181 -0.959176306 -0.009029318  1.025973766 -1.355439558
[13] -0.110210766  0.224059162  1.070662244 -0.391553826  0.684961771 -1.457592981
[19] -0.196851453  0.538363030

| That's the answer I was looking for.

  |=======================                                                          |  28%
| Now that we've isolated the non-missing values of x and put them in y, we can subset y
| as we please.

...

  |=========================                                                        |  31%
| Recall that the expression y > 0 will give us a vector of logical values the same length
| as y, with TRUEs corresponding to values of y that are greater than zero and FALSEs
| corresponding to values of y that are less than or equal to zero. What do you think y[y
| > 0] will give you?

1: A vector of all the negative elements of y
2: A vector of all the positive elements of y
3: A vector of length 0
4: A vector of all NAs
5: A vector of TRUEs and FALSEs

Selection: 2

| Nice work!

  |===========================                                                      |  33%
| Type y[y > 0] to see that we get all of the positive elements of y, which are also the
| positive elements of our original vector x.

> y[y > 0]
[1] 0.2899357 1.0347608 0.6660180 1.9373742 1.0259738 0.2240592 1.0706622 0.6849618
[9] 0.5383630

| All that practice is paying off!

  |=============================                                                    |  36%
| You might wonder why we didn't just start with x[x > 0] to isolate the positive elements
| of x. Try that now to see why.

> x[x > 0]
 [1] 0.2899357        NA        NA 1.0347608        NA 0.6660180        NA        NA
 [9]        NA        NA        NA        NA        NA        NA 1.9373742        NA
[17]        NA        NA        NA        NA 1.0259738        NA 0.2240592 1.0706622
[25] 0.6849618        NA 0.5383630        NA        NA

| That's the answer I was looking for.

  |===============================                                                  |  38%
| Since NA is not a value, but rather a placeholder for an unknown quantity, the
| expression NA > 0 evaluates to NA. Hence we get a bunch of NAs mixed in with our
| positive numbers when we do this.

...

  |=================================                                                |  41%
| Combining our knowledge of logical operators with our new knowledge of subsetting, we
| could do this -- x[!is.na(x) & x > 0]. Try it out.

> x[!is.na(x) & x > 0]
[1] 0.2899357 1.0347608 0.6660180 1.9373742 1.0259738 0.2240592 1.0706622 0.6849618
[9] 0.5383630

| You got it right!

  |===================================                                              |  44%
| In this case, we request only values of x that are both non-missing AND greater than
| zero.

...

  |=====================================                                            |  46%
| I've already shown you how to subset just the first ten values of x using x[1:10]. In
| this case, we're providing a vector of positive integers inside of the square brackets,
| which tells R to return only the elements of x numbered 1 through 10.

...

  |=======================================                                          |  49%
| Many programming languages use what's called 'zero-based indexing', which means that the
| first element of a vector is considered element 0. R uses 'one-based indexing', which
| (you guessed it!) means the first element of a vector is considered element 1.

...

  |==========================================                                       |  51%
| Can you figure out how we'd subset the 3rd, 5th, and 7th elements of x? Hint -- Use the
| c() function to specify the element numbers as a numeric vector.

> x[c(3,5,7)]
[1] NA NA NA

| Great job!

  |============================================                                     |  54%
| It's important that when using integer vectors to subset our vector x, we stick with the
| set of indexes {1, 2, ..., 40} since x only has 40 elements. What happens if we ask for
| the zeroth element of x (i.e. x[0])? Give it a try.

> x[0]
numeric(0)

| You nailed it! Good job!

  |==============================================                                   |  56%
| As you might expect, we get nothing useful. Unfortunately, R doesn't prevent us from
| doing this. What if we ask for the 3000th element of x? Try it out.

> x[3000]
[1] NA

| All that practice is paying off!

  |================================================                                 |  59%
| Again, nothing useful, but R doesn't prevent us from asking for it. This should be a
| cautionary tale. You should always make sure that what you are asking for is within the
| bounds of the vector you're working with.

...

  |==================================================                               |  62%
| What if we're interested in all elements of x EXCEPT the 2nd and 10th? It would be
| pretty tedious to construct a vector containing all numbers 1 through 40 EXCEPT 2 and
| 10.

...

  |====================================================                             |  64%
| Luckily, R accepts negative integer indexes. Whereas x[c(2, 10)] gives us ONLY the 2nd
| and 10th elements of x, x[c(-2, -10)] gives us all elements of x EXCEPT for the 2nd and
| 10 elements.  Try x[c(-2, -10)] now to see this.

> x[c(-2, -10)]
 [1]  0.289935741           NA  1.034760784           NA  0.666018013           NA
 [7]           NA           NA           NA -0.129023011           NA           NA
[13]           NA  1.937374200 -0.042131245           NA -0.962464028           NA
[19] -0.033586181           NA -0.959176306           NA           NA -0.009029318
[25]  1.025973766 -1.355439558 -0.110210766           NA  0.224059162  1.070662244
[31] -0.391553826  0.684961771 -1.457592981 -0.196851453           NA  0.538363030
[37]           NA           NA

| Great job!

  |======================================================                           |  67%
| A shorthand way of specifying multiple negative numbers is to put the negative sign out
| in front of the vector of positive numbers. Type x[-c(2, 10)] to get the exact same
| result.

> x[-c(2, 10)]
 [1]  0.289935741           NA  1.034760784           NA  0.666018013           NA
 [7]           NA           NA           NA -0.129023011           NA           NA
[13]           NA  1.937374200 -0.042131245           NA -0.962464028           NA
[19] -0.033586181           NA -0.959176306           NA           NA -0.009029318
[25]  1.025973766 -1.355439558 -0.110210766           NA  0.224059162  1.070662244
[31] -0.391553826  0.684961771 -1.457592981 -0.196851453           NA  0.538363030
[37]           NA           NA

| All that practice is paying off!

  |========================================================                         |  69%
| So far, we've covered three types of index vectors -- logical, positive integer, and
| negative integer. The only remaining type requires us to introduce the concept of
| 'named' elements.

...

  |==========================================================                       |  72%
| Create a numeric vector with three named elements using vect <- c(foo = 11, bar = 2,
| norf = NA).

> vect <- c(foo = 11, bar = 2, norf = NA)

| Keep working like that and you'll get there!

  |============================================================                     |  74%
| When we print vect to the console, you'll see that each element has a name. Try it out.

> vect
 foo  bar norf 
  11    2   NA 

| Nice work!

  |==============================================================                   |  77%
| We can also get the names of vect by passing vect as an argument to the names()
| function. Give that a try.

> names(vect)
[1] "foo"  "bar"  "norf"

| Great job!

  |================================================================                 |  79%
| Alternatively, we can create an unnamed vector vect2 with c(11, 2, NA). Do that now.

> vect2 <- c(11, 2, NA)

| You are quite good my friend!

  |==================================================================               |  82%
| Then, we can add the `names` attribute to vect2 after the fact with names(vect2) <-
| c("foo", "bar", "norf"). Go ahead.

> names(vect2) <- c("foo", "bar", "norf")

| You're the best!

  |=====================================================================            |  85%
| Now, let's check that vect and vect2 are the same by passing them as arguments to the
| identical() function.

> identical(vect, vect2)
[1] TRUE

| Nice work!

  |=======================================================================          |  87%
| Indeed, vect and vect2 are identical named vectors.

...

  |=========================================================================        |  90%
| Now, back to the matter of subsetting a vector by named elements. Which of the following
| commands do you think would give us the second element of vect?

1: vect["2"]
2: vect[bar]
3: vect["bar"]

Selection: 3

| You are doing so well!

  |===========================================================================      |  92%
| Now, try it out.

> vect["bar"]
bar 
  2 

| Excellent job!

  |=============================================================================    |  95%
| Likewise, we can specify a vector of names with vect[c("foo", "bar")]. Try it out.

> vect[c("foo", "bar")]
foo bar 
 11   2 

| You got it!

  |===============================================================================  |  97%
| Now you know all four methods of subsetting data from vectors. Different approaches are
| best in different scenarios and when in doubt, try it out!

...

  |=================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Generate Code
2: Yes
3: No

Selection: 3

| Nice work!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 6 終了。

6.4.7 Lesson 7: Matrices and Data Frames

R Programming 2.7 と同じ


| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 3

| Please choose a lesson, or type 0 to return to course menu.

 1: Setting Up Swirl              2: Basic Building Blocks      
 3: Sequences of Numbers          4: Vectors                    
 5: Missing Values                6: Subsetting Vectors         
 7: Matrices and Data Frames      8: Logic                      
 9: Workspace and Files          10: Reading Tabular Data       
11: Looking at Data              12: Data Manipulation          
13: Text Manipulation Functions  14: Regular Expressions        
15: The stringr Package          

Selection: 7

  |                                                                                 |   0%

| In this lesson, we'll cover matrices and data frames. Both represent 'rectangular' data
| types, meaning that they are used to store tabular data, with rows and columns.

...

  |==                                                                               |   3%
| The main difference, as you'll see, is that matrices can only contain a single class of
| data, while data frames can consist of many different classes of data.

...

  |====                                                                             |   6%
| Let's create a vector containing the numbers 1 through 20 using the `:` operator. Store
| the result in a variable called my_vector.

> my_vector <- 1:20

| You are amazing!

  |=======                                                                          |   8%
| View the contents of the vector you just created.

> my_vector
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

| That's a job well done!

  |=========                                                                        |  11%
| The dim() function tells us the 'dimensions' of an object. What happens if we do
| dim(my_vector)? Give it a try.

> dim(my_vector)
NULL

| Excellent work!

  |===========                                                                      |  14%
| Clearly, that's not very helpful! Since my_vector is a vector, it doesn't have a `dim`
| attribute (so it's just NULL), but we can find its length using the length() function.
| Try that now.

> length(my_vector)
[1] 20

| Keep working like that and you'll get there!

  |==============                                                                   |  17%
| Ah! That's what we wanted. But, what happens if we give my_vector a `dim` attribute?
| Let's give it a try. Type dim(my_vector) <- c(4, 5).

> dim(my_vector) <- c(4, 5)

| You are doing so well!

  |================                                                                 |  19%
| It's okay if that last command seemed a little strange to you. It should! The dim()
| function allows you to get OR set the `dim` attribute for an R object. In this case, we
| assigned the value c(4, 5) to the `dim` attribute of my_vector.

...

  |==================                                                               |  22%
| Use dim(my_vector) to confirm that we've set the `dim` attribute correctly.

> dim(my_vector)
[1] 4 5

| All that hard work is paying off!

  |====================                                                             |  25%
| Another way to see this is by calling the attributes() function on my_vector. Try it
| now.

> attributes(my_vector)
$dim
[1] 4 5


| That's correct!

  |======================                                                           |  28%
| Just like in math class, when dealing with a 2-dimensional object (think rectangular
| table), the first number is the number of rows and the second is the number of columns.
| Therefore, we just gave my_vector 4 rows and 5 columns.

...

  |=========================                                                        |  31%
| But, wait! That doesn't sound like a vector any more. Well, it's not. Now it's a matrix.
| View the contents of my_vector now to see what it looks like.

> my_vector
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    2    6   10   14   18
[3,]    3    7   11   15   19
[4,]    4    8   12   16   20

| That's correct!

  |===========================                                                      |  33%
| Now, let's confirm it's actually a matrix by using the class() function. Type
| class(my_vector) to see what I mean.

> class(my_vector)
[1] "matrix"

| Excellent work!

  |=============================                                                    |  36%
| Sure enough, my_vector is now a matrix. We should store it in a new variable that helps
| us remember what it is. Store the value of my_vector in a new variable called my_matrix.

> my_matrix <- my_vector

| That's the answer I was looking for.

  |===============================                                                  |  39%
| The example that we've used so far was meant to illustrate the point that a matrix is
| simply an atomic vector with a dimension attribute. A more direct method of creating the
| same matrix uses the matrix() function.

...

  |==================================                                               |  42%
| Bring up the help file for the matrix() function now using the `?` function.

> ?matrix

| All that practice is paying off!

  |====================================                                             |  44%
| Now, look at the documentation for the matrix function and see if you can figure out how
| to create a matrix containing the same numbers (1-20) and dimensions (4 rows, 5 columns)
| by calling the matrix() function. Store the result in a variable called my_matrix2.

> my_matrix2 <- matrix(1:20, nrow = 4)

| Perseverance, that's the answer.

  |======================================                                           |  47%
| Finally, let's confirm that my_matrix and my_matrix2 are actually identical. The
| identical() function will tell us if its first two arguments are the same. Try it out.

> identical(my_matrix, my_matrix2)
[1] TRUE

| That's a job well done!

  |========================================                                         |  50%
| Now, imagine that the numbers in our table represent some measurements from a clinical
| experiment, where each row represents one patient and each column represents one
| variable for which measurements were taken.

...

  |===========================================                                      |  53%
| We may want to label the rows, so that we know which numbers belong to each patient in
| the experiment. One way to do this is to add a column to the matrix, which contains the
| names of all four people.

...

  |=============================================                                    |  56%
| Let's start by creating a character vector containing the names of our patients -- Bill,
| Gina, Kelly, and Sean. Remember that double quotes tell R that something is a character
| string. Store the result in a variable called patients.

> patients <- c("Bill", "Gina", "Kelly", "Sean")

| You are amazing!

  |===============================================                                  |  58%
| Now we'll use the cbind() function to 'combine columns'. Don't worry about storing the
| result in a new variable. Just call cbind() with two arguments -- the patients vector
| and my_matrix.

> cbind(patients, my_matrix)
     patients                       
[1,] "Bill"   "1" "5" "9"  "13" "17"
[2,] "Gina"   "2" "6" "10" "14" "18"
[3,] "Kelly"  "3" "7" "11" "15" "19"
[4,] "Sean"   "4" "8" "12" "16" "20"

| You're the best!

  |=================================================                                |  61%
| Something is fishy about our result! It appears that combining the character vector with
| our matrix of numbers caused everything to be enclosed in double quotes. This means
| we're left with a matrix of character strings, which is no good.

...

  |====================================================                             |  64%
| If you remember back to the beginning of this lesson, I told you that matrices can only
| contain ONE class of data. Therefore, when we tried to combine a character vector with a
| numeric matrix, R was forced to 'coerce' the numbers to characters, hence the double
| quotes.

...

  |======================================================                           |  67%
| This is called 'implicit coercion', because we didn't ask for it. It just happened. But
| why didn't R just convert the names of our patients to numbers? I'll let you ponder that
| question on your own.

...

  |========================================================                         |  69%
| So, we're still left with the question of how to include the names of our patients in
| the table without destroying the integrity of our numeric data. Try the following --
| my_data <- data.frame(patients, my_matrix)

> my_data <- data.frame(patients, my_matrix)

| You are really on a roll!

  |==========================================================                       |  72%
| Now view the contents of my_data to see what we've come up with.

> my_data
  patients X1 X2 X3 X4 X5
1     Bill  1  5  9 13 17
2     Gina  2  6 10 14 18
3    Kelly  3  7 11 15 19
4     Sean  4  8 12 16 20

| You are doing so well!

  |=============================================================                    |  75%
| It looks like the data.frame() function allowed us to store our character vector of
| names right alongside our matrix of numbers. That's exactly what we were hoping for!

...

  |===============================================================                  |  78%
| Behind the scenes, the data.frame() function takes any number of arguments and returns a
| single object of class `data.frame` that is composed of the original objects.

...

  |=================================================================                |  81%
| Let's confirm this by calling the class() function on our newly created data frame.

> class(my_data)
[1] "data.frame"

| You got it right!

  |====================================================================             |  83%
| It's also possible to assign names to the individual rows and columns of a data frame,
| which presents another possible way of determining which row of values in our table
| belongs to each patient.

...

  |======================================================================           |  86%
| However, since we've already solved that problem, let's solve a different problem by
| assigning names to the columns of our data frame so that we know what type of
| measurement each column represents.

...

  |========================================================================         |  89%
| Since we have six columns (including patient names), we'll need to first create a vector
| containing one element for each column. Create a character vector called cnames that
| contains the following values (in order) -- "patient", "age", "weight", "bp", "rating",
| "test".

> cnames <- c("patient", "age", "weight", "bp", "rating", "test")

| That's a job well done!

  |==========================================================================       |  92%
| Now, use the colnames() function to set the `colnames` attribute for our data frame.
| This is similar to the way we used the dim() function earlier in this lesson.

> colnames(my_data) <- cnames

| Your dedication is inspiring!

  |============================================================================     |  94%
| Let's see if that got the job done. Print the contents of my_data.

> my_data
  patient age weight bp rating test
1    Bill   1      5  9     13   17
2    Gina   2      6 10     14   18
3   Kelly   3      7 11     15   19
4    Sean   4      8 12     16   20

| You are really on a roll!

  |===============================================================================  |  97%
| In this lesson, you learned the basics of working with two very important and common
| data structures -- matrices and data frames. There's much more to learn and we'll be
| covering more advanced topics, particularly with respect to data frames, in future
| lessons.

...

  |=================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Generate Code
2: Yes
3: No

Selection: 3

| That's correct!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 7 終了。

6.4.8 Lesson 8: Logic

R Programming 2.8 と同じ

| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 3

| Please choose a lesson, or type 0 to return to course menu.

 1: Setting Up Swirl              2: Basic Building Blocks      
 3: Sequences of Numbers          4: Vectors                    
 5: Missing Values                6: Subsetting Vectors         
 7: Matrices and Data Frames      8: Logic                      
 9: Workspace and Files          10: Reading Tabular Data       
11: Looking at Data              12: Data Manipulation          
13: Text Manipulation Functions  14: Regular Expressions        
15: The stringr Package          

Selection: 8

  |                                                                                 |   0%

| This lesson is meant to be a short introduction to logical operations in R.

...

  |==                                                                               |   2%
| There are two logical values in R, also called boolean values. They are TRUE and FALSE.
| In R you can construct logical expressions which will evaluate to either TRUE or FALSE.

...

  |===                                                                              |   4%
| Many of the questions in this lesson will involve evaluating logical expressions. It may
| be useful to open up a second R terminal where you can experiment with some of these
| expressions.

...

  |=====                                                                            |   6%
| Creating logical expressions requires logical operators. You're probably familiar with
| arithmetic operators like `+`, `-`, `*`, and `/`. The first logical operator we are
| going to discuss is the equality operator, represented by two equals signs `==`. Use the
| equality operator below to find out if TRUE is equal to TRUE.

> TRUE == TRUE
[1] TRUE

| Keep up the great work!

  |======                                                                           |   8%
| Just like arithmetic, logical expressions can be grouped by parenthesis so that the
| entire expression (TRUE == TRUE) == TRUE evaluates to TRUE.

...

  |========                                                                         |  10%
| To test out this property, try evaluating (FALSE == TRUE) == FALSE .

> (FALSE == TRUE) == FALSE
[1] TRUE

| You are really on a roll!

  |=========                                                                        |  12%
| The equality operator can also be used to compare numbers. Use `==` to see if 6 is equal
| to 7.

> 6 == 7
[1] FALSE

| Keep up the great work!

  |===========                                                                      |  13%
| The previous expression evaluates to FALSE because 6 is less than 7. Thankfully, there
| are inequality operators that allow us to test if a value is less than or greater than
| another value.

...

  |============                                                                     |  15%
| The less than operator `<` tests whether the number on the left side of the operator
| (called the left operand) is less than the number on the right side of the operator
| (called the right operand). Write an expression to test whether 6 is less than 7.

> 6 < 7
[1] TRUE

| That's the answer I was looking for.

  |==============                                                                   |  17%
| There is also a less-than-or-equal-to operator `<=` which tests whether the left operand
| is less than or equal to the right operand. Write an expression to test whether 10 is
| less than or equal to 10.

> 10 <= 10
[1] TRUE

| All that hard work is paying off!

  |================                                                                 |  19%
| Keep in mind that there are the corresponding greater than `>` and
| greater-than-or-equal-to `>=` operators.

...

  |=================                                                                |  21%
| Which of the following evaluates to FALSE?

1: 7 == 7
2: 9 >= 10
3: 0 > -36
4: 6 < 8

Selection: 2

| Great job!

  |===================                                                              |  23%
| Which of the following evaluates to TRUE?

1: 7 == 9
2: -6 > -7
3: 9 >= 10
4: 57 < 8

Selection: 2

| That's the answer I was looking for.

  |====================                                                             |  25%
| The next operator we will discuss is the 'not equals' operator represented by `!=`. Not
| equals tests whether two values are unequal, so TRUE != FALSE evaluates to TRUE. Like
| the equality operator, `!=` can also be used with numbers. Try writing an expression to
| see if 5 is not equal to 7.

> 5 != 7
[1] TRUE

| You are doing so well!

  |======================                                                           |  27%
| In order to negate boolean expressions you can use the NOT operator. An exclamation
| point `!` will cause !TRUE (say: not true) to evaluate to FALSE and !FALSE (say: not
| false) to evaluate to TRUE. Try using the NOT operator and the equals operator to find
| the opposite of whether 5 is equal to 7.

> !(5==7)
[1] TRUE

| That's a job well done!

  |=======================                                                          |  29%
| Let's take a moment to review. The equals operator `==` tests whether two boolean values
| or numbers are equal, the not equals operator `!=` tests whether two boolean values or
| numbers are unequal, and the NOT operator `!` negates logical expressions so that TRUE
| expressions become FALSE and FALSE expressions become TRUE.

...

  |=========================                                                        |  31%
| Which of the following evaluates to FALSE?

1: !(0 >= -1)
2: !FALSE
3: 7 != 8
4: 9 < 10

Selection: 1

| All that hard work is paying off!

  |==========================                                                       |  33%
| What do you think the following expression will evaluate to?: (TRUE != FALSE) == !(6 ==
| 7)

1: Can there be objective truth when programming?
2: TRUE
3: %>%
4: FALSE

Selection: 2

| You got it!

  |============================                                                     |  35%
| At some point you may need to examine relationships between multiple logical
| expressions. This is where the AND operator and the OR operator come in.

...

  |==============================                                                   |  37%
| Let's look at how the AND operator works. There are two AND operators in R, `&` and
| `&&`. Both operators work similarly, if the right and left operands of AND are both TRUE
| the entire expression is TRUE, otherwise it is FALSE. For example, TRUE & TRUE evaluates
| to TRUE. Try typing FALSE & FALSE to how it is evaluated.

> FALSE & FALSE
[1] FALSE

| You are quite good my friend!

  |===============================                                                  |  38%
| You can use the `&` operator to evaluate AND across a vector. The `&&` version of AND
| only evaluates the first member of a vector. Let's test both for practice. Type the
| expression TRUE & c(TRUE, FALSE, FALSE).

> TRUE & c(TRUE, FALSE, FALSE)
[1]  TRUE FALSE FALSE

| Your dedication is inspiring!

  |=================================                                                |  40%
| What happens in this case is that the left operand `TRUE` is recycled across every
| element in the vector of the right operand. This is the equivalent statement as c(TRUE,
| TRUE, TRUE) & c(TRUE, FALSE, FALSE).

...

  |==================================                                               |  42%
| Now we'll type the same expression except we'll use the `&&` operator. Type the
| expression TRUE && c(TRUE, FALSE, FALSE).

> TRUE && c(TRUE, FALSE, FALSE)
[1] TRUE

| Keep up the great work!

  |====================================                                             |  44%
| In this case, the left operand is only evaluated with the first member of the right
| operand (the vector). The rest of the elements in the vector aren't evaluated at all in
| this expression.

...

  |=====================================                                            |  46%
| The OR operator follows a similar set of rules. The `|` version of OR evaluates OR
| across an entire vector, while the `||` version of OR only evaluates the first member of
| a vector.

...

  |=======================================                                          |  48%
| An expression using the OR operator will evaluate to TRUE if the left operand or the
| right operand is TRUE. If both are TRUE, the expression will evaluate to TRUE, however
| if neither are TRUE, then the expression will be FALSE.

...

  |========================================                                         |  50%
| Let's test out the vectorized version of the OR operator. Type the expression TRUE |
| c(TRUE, FALSE, FALSE).

> TRUE | c(TRUE, FALSE, FALSE)
[1] TRUE TRUE TRUE

| That's the answer I was looking for.

  |==========================================                                       |  52%
| Now let's try out the non-vectorized version of the OR operator. Type the expression
| TRUE || c(TRUE, FALSE, FALSE).

> TRUE || c(TRUE, FALSE, FALSE)
[1] TRUE

| That's a job well done!

  |============================================                                     |  54%
| Logical operators can be chained together just like arithmetic operators. The
| expressions: `6 != 10 && FALSE && 1 >= 2` or `TRUE || 5 < 9.3 || FALSE` are perfectly
| normal to see.

...

  |=============================================                                    |  56%
| As you may recall, arithmetic has an order of operations and so do logical
| expressions. All AND operators are evaluated before OR operators. Let's look at an
| example of an ambiguous case. Type: 5 > 8 || 6 != 8 && 4 > 3.9

> 5 > 8 || 6 != 8 && 4 > 3.9
[1] TRUE

| That's correct!

  |===============================================                                  |  58%
| Let's walk through the order of operations in the above case. First the left and right
| operands of the AND operator are evaluated. 6 is not equal 8, 4 is greater than 3.9,
| therefore both operands are TRUE so the resulting expression `TRUE && TRUE` evaluates
| to TRUE. Then the left operand of the OR operator is evaluated: 5 is not greater than
| 8 so the entire expression is reduced to FALSE || TRUE. Since the right operand of
| this expression is TRUE the entire expression evaluates to TRUE.

...

  |================================================                                 |  60%
| Which one of the following expressions evaluates to TRUE?

1: TRUE && FALSE || 9 >= 4 && 3 < 6
2: FALSE || TRUE && FALSE
3: TRUE && 62 < 62 && 44 >= 44
4: 99.99 > 100 || 45 < 7.3 || 4 != 4.0

Selection: 1

| You are quite good my friend!

  |==================================================                               |  62%
| Which one of the following expressions evaluates to FALSE?

1: FALSE || TRUE && 6 != 4 || 9 > 4
2: FALSE && 6 >= 6 || 7 >= 8 || 50 <= 49.5
3: 6 >= -9 && !(6 > 7) && !(!TRUE)
4: !(8 > 4) ||  5 == 5.0 && 7.8 >= 7.79

Selection: 2

| Excellent work!

  |===================================================                              |  63%
| Now that you're familiar with R's logical operators you can take advantage of a few
| functions that R provides for dealing with logical expressions.

...

  |=====================================================                            |  65%
| The function isTRUE() takes one argument. If that argument evaluates to TRUE, the
| function will return TRUE. Otherwise, the function will return FALSE. Try using this
| function by typing: isTRUE(6 > 4)

> isTRUE(6 > 4)
[1] TRUE

| All that practice is paying off!

  |=======================================================                          |  67%
| Which of the following evaluates to TRUE?

1: isTRUE(!TRUE)
2: isTRUE(3)
3: !isTRUE(4 < 3)
4: !isTRUE(8 != 5)
5: isTRUE(NA)

Selection: 3

| Nice work!

  |========================================================                         |  69%
| The function identical() will return TRUE if the two R objects passed to it as
| arguments are identical. Try out the identical() function by typing:
| identical('twins', 'twins')

> identical('twins', 'twins')
[1] TRUE

| All that hard work is paying off!

  |==========================================================                       |  71%
| Which of the following evaluates to TRUE?

1: identical('hello', 'Hello')
2: !identical(7, 7)
3: identical(5 > 4, 3 < 3.1)
4: identical(4, 3.1)

Selection: 3

| You are quite good my friend!

  |===========================================================                      |  73%
| You should also be aware of the xor() function, which takes two arguments. The xor()
| function stands for exclusive OR. If one argument evaluates to TRUE and one argument
| evaluates to FALSE, then this function will return TRUE, otherwise it will return
| FALSE. Try out the xor() function by typing: xor(5 == 6, !FALSE)

> xor(5 == 6, !FALSE)
[1] TRUE

| That's correct!

  |=============================================================                    |  75%
| 5 == 6 evaluates to FALSE, !FALSE evaluates to TRUE, so xor(FALSE, TRUE) evaluates to
| TRUE. On the other hand if the first argument was changed to 5 == 5 and the second
| argument was unchanged then both arguments would have been TRUE, so xor(TRUE, TRUE)
| would have evaluated to FALSE.

...

  |==============================================================                   |  77%
| Which of the following evaluates to FALSE?

1: xor(identical(xor, 'xor'), 7 == 7.0)
2: xor(4 >= 9, 8 != 8.0)
3: xor(!isTRUE(TRUE), 6 > -1)
4: xor(!!TRUE, !!FALSE)

Selection: 2

| You nailed it! Good job!

  |================================================================                 |  79%
| For the next few questions, we're going to need to create a vector of integers called
| ints. Create this vector by typing: ints <- sample(10)

> ints <- sample(10)

| Great job!

  |=================================================================                |  81%
| Now simply display the contents of ints.

> ints
 [1]  6  7  9  8  3  4  5  2  1 10

| That's the answer I was looking for.

  |===================================================================              |  83%
| The vector `ints` is a random sampling of integers from 1 to 10 without replacement.
| Let's say we wanted to ask some logical questions about contents of ints. If we type
| ints > 5, we will get a logical vector corresponding to whether each element of ints
| is greater than 5. Try typing: ints > 5

> ints > 5
 [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE

| You nailed it! Good job!

  |=====================================================================            |  85%
| We can use the resulting logical vector to ask other questions about ints. The which()
| function takes a logical vector as an argument and returns the indices of the vector
| that are TRUE. For example which(c(TRUE, FALSE, TRUE)) would return the vector c(1,
| 3).

...

  |======================================================================           |  87%
| Use the which() function to find the indices of ints that are greater than 7.

> which(ints > 7)
[1]  3  4 10

| All that hard work is paying off!

  |========================================================================         |  88%
| Which of the following commands would produce the indices of the elements in ints that
| are less than or equal to 2?

1: which(ints < 2)
2: ints < 2
3: ints <= 2
4: which(ints <= 2)

Selection: 4

| You're the best!

  |=========================================================================        |  90%
| Like the which() function, the functions any() and all() take logical vectors as their
| argument. The any() function will return TRUE if one or more of the elements in the
| logical vector is TRUE. The all() function will return TRUE if every element in the
| logical vector is TRUE.

...

  |===========================================================================      |  92%
| Use the any() function to see if any of the elements of ints are less than zero.

> any(ints < 0)
[1] FALSE

| Perseverance, that's the answer.

  |============================================================================     |  94%
| Use the all() function to see if all of the elements of ints are greater than zero.

> all(ints > 0)
[1] TRUE

| Excellent job!

  |==============================================================================   |  96%
| Which of the following evaluates to TRUE?

1: all(c(TRUE, FALSE, TRUE))
2: any(ints == 10)
3: all(ints == 10)
4: any(ints == 2.5)

Selection: 2

| Excellent work!

  |===============================================================================  |  98%
| That's all for this introduction to logic in R. If you really want to see what you can
| do with logic, check out the control flow lesson!

...

  |=================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Generate Code
2: Yes
3: No

Selection: 3

| You're the best!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 8 終了。

6.4.9 Lesson 9: Workspace and Files

R Programming 2.2 と同じ


| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 3

| Please choose a lesson, or type 0 to return to course menu.

 1: Setting Up Swirl              2: Basic Building Blocks      
 3: Sequences of Numbers          4: Vectors                    
 5: Missing Values                6: Subsetting Vectors         
 7: Matrices and Data Frames      8: Logic                      
 9: Workspace and Files          10: Reading Tabular Data       
11: Looking at Data              12: Data Manipulation          
13: Text Manipulation Functions  14: Regular Expressions        
15: The stringr Package          

Selection: 9

  |                                                                               |   0%

| In this lesson, you'll learn how to examine your local workspace in R and begin to
| explore the relationship between your workspace and the file system of your machine.

...

  |==                                                                             |   3%
| Because different operating systems have different conventions with regards to things
| like file paths, the outputs of these commands may vary across machines.

...

  |====                                                                           |   5%
| However it's important to note that R provides a common API (a common set of commands)
| for interacting with files, that way your code will work across different kinds of
| computers.

...

  |======                                                                         |   8%
| Let's jump right in so you can get a feel for how these special functions work!

...

  |========                                                                       |  11%
| Determine which directory your R session is using as its current working directory
| using getwd().

> getwd()

_省略_

| You are amazing!

  |==========                                                                     |  13%
| List all the objects in your local workspace using ls().

> ls()
 [1] "a"          "A"          "b"          "B"          "cnames"     
_以下省略_

| All that practice is paying off!

  |============                                                                   |  16%
| Some R commands are the same as their equivalents commands on Linux or on a Mac. Both
| Linux and Mac operating systems are based on an operating system called Unix. It's
| always a good idea to learn more about Unix!

...

  |===============                                                                |  18%
| Assign 9 to x using x <- 9.

> x <- 9

| You are really on a roll!

  |=================                                                              |  21%
| Now take a look at objects that are in your workspace using ls().

> ls()
 [1] "a"          "A"          "b"          "B"          "cnames"     "x"
_以下省略_

| All that practice is paying off!

  |===================                                                            |  24%
| List all the files in your working directory using list.files() or dir().

> list.files()
character(0)

| Great job!

  |=====================                                                          |  26%
| As we go through this lesson, you should be examining the help page for each new
| function. Check out the help page for list.files with the command ?list.files.

> ?list.files

| Excellent work!

  |=======================                                                        |  29%
| One of the most helpful parts of any R help file is the See Also section. Read that
| section for list.files. Some of these functions may be used in later portions of this
| lesson.

...

  |=========================                                                      |  32%
| Using the args() function on a function name is also a handy way to see what arguments
| a function can take.

...

  |===========================                                                    |  34%
| Use the args() function to determine the arguments to list.files().

> args(list.files)
function (path = ".", pattern = NULL, all.files = FALSE, full.names = FALSE, 
    recursive = FALSE, ignore.case = FALSE, include.dirs = FALSE, 
    no.. = FALSE) 
NULL

| You are quite good my friend!

  |=============================                                                  |  37%
| Assign the value of the current working directory to a variable called "old.dir".

> old.dir <- getwd()

| Excellent job!

  |===============================                                                |  39%
| We will use old.dir at the end of this lesson to move back to the place that we
| started. A lot of query functions like getwd() have the useful property that they
| return the answer to the question as a result of the function.

...

  |=================================                                              |  42%
| Use dir.create() to create a directory in the current working directory called
| "testdir".

> dir.create("testdir")

| That's the answer I was looking for.

  |===================================                                            |  45%
| We will do all our work in this new directory and then delete it after we are done.
| This is the R analog to "Take only pictures, leave only footprints."

...

  |=====================================                                          |  47%
| Set your working directory to "testdir" with the setwd() command.

> setwd("testdir")

| That's correct!

  |========================================                                       |  50%
| In general, you will want your working directory to be someplace sensible, perhaps
| created for the specific project that you are working on. In fact, organizing your
| work in R packages using RStudio is an excellent option. Check out RStudio at
| http://www.rstudio.com/

...

  |==========================================                                     |  53%
| Create a file in your working directory called "mytest.R" using the file.create()
| function.

> file.create("mytest.R")
[1] TRUE

| You are really on a roll!

  |============================================                                   |  55%
| This should be the only file in this newly created directory. Let's check this by
| listing all the files in the current directory.

> list.files()
[1] "mytest.R"

| Perseverance, that's the answer.

  |==============================================                                 |  58%
| Check to see if "mytest.R" exists in the working directory using the file.exists()
| function.

> file.exists("mytest.R")
[1] TRUE

| That's the answer I was looking for.

  |================================================                               |  61%
| These sorts of functions are excessive for interactive use. But, if you are running a
| program that loops through a series of files and does some processing on each one, you
| will want to check to see that each exists before you try to process it.

...

  |==================================================                             |  63%
| Access information about the file "mytest.R" by using file.info().

> file.info("mytest.R")

_省略_

| You are amazing!

  |====================================================                           |  66%
| You can use the $ operator --- e.g., file.info("mytest.R")$mode --- to grab specific
| items.

...

  |======================================================                         |  68%
| Change the name of the file "mytest.R" to "mytest2.R" by using file.rename().

> file.rename("mytest.R", "mytest2.R")
[1] TRUE

| Great job!

  |========================================================                       |  71%
| Your operating system will provide simpler tools for these sorts of tasks, but having
| the ability to manipulate files programatically is useful. You might now try to delete
| mytest.R using file.remove('mytest.R'), but that won't work since mytest.R no longer
| exists. You have already renamed it.

...

  |==========================================================                     |  74%
| Make a copy of "mytest2.R" called "mytest3.R" using file.copy().

> file.copy("mytest2.R", "mytest3.R")
[1] TRUE

| You are amazing!

  |============================================================                   |  76%
| You now have two files in the current directory. That may not seem very interesting.
| But what if you were working with dozens, or millions, of individual files? In that
| case, being able to programatically act on many files would be absolutely necessary.
| Don't forget that you can, temporarily, leave the lesson by typing play() and then
| return by typing nxt().

...

  |==============================================================                 |  79%
| Provide the relative path to the file "mytest3.R" by using file.path().

> file.path("mytest3.R")
[1] "mytest3.R"

| That's a job well done!

  |================================================================               |  82%
| You can use file.path to construct file and directory paths that are independent of
| the operating system your R code is running on. Pass 'folder1' and 'folder2' as
| arguments to file.path to make a platform-independent pathname.

> file.path('folder1', 'folder2')
[1] "folder1/folder2"

| Your dedication is inspiring!

  |===================================================================            |  84%
| Take a look at the documentation for dir.create by entering ?dir.create . Notice the
| 'recursive' argument. In order to create nested directories, 'recursive' must be set
| to TRUE.

> ?dir.create

| Your dedication is inspiring!

  |=====================================================================          |  87%
| Create a directory in the current working directory called "testdir2" and a
| subdirectory for it called "testdir3", all in one command by using dir.create() and
| file.path().

> dir.create(file.path("testdir2", "testdir3"), recursive = TRUE)

| Keep up the great work!

  |=======================================================================        |  89%
| Go back to your original working directory using setwd(). (Recall that we created the
| variable old.dir with the full path for the orginal working directory at the start of
| these questions.)

> setwd(old.dir)

| That's the answer I was looking for.

  |=========================================================================      |  92%
| It is often helpful to save the settings that you had before you began an analysis and
| then go back to them at the end. This trick is often used within functions; you save,
| say, the par() settings that you started with, mess around a bunch, and then set them
| back to the original values at the end. This isn't the same as what we have done here,
| but it seems similar enough to mention.

...

  |===========================================================================    |  95%
| Take nothing but results. Leave nothing but assumptions. That sounds like 'Take
| nothing but pictures. Leave nothing but footprints.' But it makes no sense! Surely our
| readers can come up with a better motto . . .

...

  |=============================================================================  |  97%
| In this lesson, you learned how to examine your R workspace and work with the file
| system of your machine from within R. Thanks for playing!

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes
3: Generate Code

Selection: 1

| That's the answer I was looking for.

| You've reached the end of this lesson! Returning to the main menu...

Lesson 9 終了。

6.4.10 Lesson 10: Reading Tabular Data

readr Package について学びます。一群の、tidyverse Package の一つです。

  • Tidyverse: R packages for data science
  • readr Package: readr provides a fast and friendly way to read rectangular data (like csv, tsv, and fwf). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes.
  • このパッケージを利用するためには、library(readr) または library(tidyverse) とします。
| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 3

| Please choose a lesson, or type 0 to return to course menu.

 1: Setting Up Swirl              2: Basic Building Blocks      
 3: Sequences of Numbers          4: Vectors                    
 5: Missing Values                6: Subsetting Vectors         
 7: Matrices and Data Frames      8: Logic                      
 9: Workspace and Files          10: Reading Tabular Data       
11: Looking at Data              12: Data Manipulation          
13: Text Manipulation Functions  14: Regular Expressions        
15: The stringr Package          

Selection: 10

| Attempting to load lesson dependencies...

| Package ‘readr’ loaded correctly!

  |                                                                               |   0%

| The purpose of this lesson is to demonstrate some basic features of the 'readr'
| package in R for reading in tabular data.

...

  |=====                                                                          |   6%
| For some more details on the 'readr' package, you can refer to Section 1.3 of the book
| Mastering Software Development in R.

...

  |=========                                                                      |  12%
| For this lesson, we will be using a small dataset on zip codes in the United States.
| This table contains information about the percent of each zip codes that is considered
| urban by the U.S. Census Bureau.

...

  |==============                                                                 |  18%
| First load the 'readr' package using the library() function.

> library(readr)

| Excellent job!

  |===================                                                            |  24%
| The file that we will be reading into R is called 'urban.csv.gz'. It is a
| comma-separated-value (CSV) file compressed with gzip.

...

  |=======================                                                        |  29%
| The directory in which the 'urban.csv.gz' data file is located is saved in an R object
| called 'datapath'. Print out this object to see the path.

> print(datapath)
[1] 
_省略_

| Nice work!

  |============================                                                   |  35%
| Use the 'file.path()' function to construct the full path to the 'urban.csv.gz' file
| and store it in an object called 'datafile'

> datafile <- file.path(datapath, "urban.csv.gz")

| That's the answer I was looking for.

  |=================================                                              |  41%
| How would you read this file into R using the read_csv() function in the 'readr'
| function and save the resulting data frame to an object named 'urban'? Assume that the
| file is in your current working directory.

> urban <- read_csv(datafile)
Parsed with column specification:
cols(
  state = col_character(),
  zcta5 = col_double(),
  County = col_character(),
  PctUrban = col_double(),
  FIPS = col_character()
)

| Great job!

  |=====================================                                          |  47%
| Take a look at the first few rows of the 'urban' data frame object by calling the
| head() function on it

最初の6行を表示させます。

> head(urban)
# A tibble: 6 x 5
  state zcta5 County        PctUrban FIPS 
  <chr> <dbl> <chr>            <dbl> <chr>
1 01    35004 St. Clair AL      64   01115
2 01    35005 Jefferson AL      79.8 01073
3 01    35006 Jefferson AL       0   01073
4 01    35007 Shelby AL         90.5 01117
5 01    35010 Tallapoosa AL     45.2 01123
6 01    35013 Blount AL        100   01009

| All that practice is paying off!

  |==========================================                                     |  53%
| You'll see that there are 5 columns - state, zcta5 (a 5-digit zip code identifier),
| County (the name of the county in which the zip code resides), PctUrban (the
| percentage of the zip code that is considered urban), and FIPS (a 5-digit identifier
| for the county).

...

  |==============================================                                 |  59%
| Notice that the second column ("zcta5") was read in as an integer column. In most
| applications we will want this to be a character column.

...

  |===================================================                            |  65%
| How can we use the compact string representation of the 'col_types' argument to
| read_csv() to force the "zcta5" column to be character? Leave all of the other columns
| as the type that the currently are read in as and remember to save the output of
| read_csv() to an object named 'urban'.

> urban <- read_csv(datafile, col_types = cols(zcta5 = col_character()))

| That's not exactly what I'm looking for. Try again. Or, type info() for more options.

| The 'col_types' argument accepts a string argument containing letters representing the
| type of each column. For example, the string "ccc" says that the all three columns of
| a table should be "character".

> urban <- read_csv(datafile, col_types = "cccdc")

| Nice work!

  |========================================================                       |  71%
| It turns out we also don't need the last column of the data frame (the "FIPS") column.
| In order to save some space, we would like to tell read_csv() to not read that last
| column.

...

  |============================================================                   |  76%
| How can we tell read_csv() to not read in the last column of the table? Note that you
| can use the compact string representation of the 'col_types' argument for this too.
| Remember to save the output of read_csv() to an object named 'urban'. You can type
| 'play()' to play around in R before giving your answer.

> urban <- read_csv(datafile, col_types = "cccd-")

| You nailed it! Good job!

  |=================================================================              |  82%
| Take a look at the first few rows of the 'urban' data frame object by calling the
| head() function on it. Confirm that there are fewer columns now.

> head(urban)
# A tibble: 6 x 4
  state zcta5 County        PctUrban
  <chr> <chr> <chr>            <dbl>
1 01    35004 St. Clair AL      64  
2 01    35005 Jefferson AL      79.8
3 01    35006 Jefferson AL       0  
4 01    35007 Shelby AL         90.5
5 01    35010 Tallapoosa AL     45.2
6 01    35013 Blount AL        100  

zcta5 の class が、<dbl> から <chr> に変わりました。

| That's the answer I was looking for.

  |======================================================================         |  88%
| Finally, we may only need the first few rows of a table at first and so it might make
| sense to tell read_csv() to only read in a few rows rather than the entire dataset.

...

  |==========================================================================     |  94%
| Building on what we have done before, how would you additionally tell read_csv() to
| only read in the first 100 rows? Note that we still want to force the second column to
| be character and we do not want to read the last column.

> urban <- read_csv(datafile, col_types = "cccd-", n_max = 100)

| Keep working like that and you'll get there!

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes
3: Generate Code

Selection: 1

| You are quite good my friend!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 10 終了。

6.4.11 Lesson 11: Looking at Data

R Programming 2.12 と同じ


| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: The R Programming Environment
4: Take me to the swirl course repository!

Selection: 3

| Please choose a lesson, or type 0 to return to course menu.

 1: Setting Up Swirl              2: Basic Building Blocks      
 3: Sequences of Numbers          4: Vectors                    
 5: Missing Values                6: Subsetting Vectors         
 7: Matrices and Data Frames      8: Logic                      
 9: Workspace and Files          10: Reading Tabular Data       
11: Looking at Data              12: Data Manipulation          
13: Text Manipulation Functions  14: Regular Expressions        
15: The stringr Package          

Selection: 11

  |                                                                               |   0%

| Whenever you're working with a new dataset, the first thing you should do is look at
| it! What is the format of the data? What are the dimensions? What are the variable
| names? How are the variables stored? Are there missing data? Are there any flaws in
| the data?

...

  |===                                                                            |   4%
| This lesson will teach you how to answer these questions and more using R's built-in
| functions. We'll be using a dataset constructed from the United States Department of
| Agriculture's PLANTS Database (http://plants.usda.gov/adv_search.html).

...

  |======                                                                         |   8%
| I've stored the data for you in a variable called plants. Type ls() to list the
| variables in your workspace, among which should be plants.

> ls()
 [1] "a"          "A"          "b"          "B"          "cnames"     "d"         
(省略)
[37] "patients"   "plants"     "presents"   "Q"          "s"          "t"         
(省略)

| Excellent work!

  |=========                                                                      |  12%
| Let's begin by checking the class of the plants variable with class(plants). This will
| give us a clue as to the overall structure of the data.

> class(plants)
[1] "data.frame"

| That's a job well done!

  |=============                                                                  |  16%
| It's very common for data to be stored in a data frame. It is the default class for
| data read into R using functions like read.csv() and read.table(), which you'll learn
| about in another lesson.

...

  |================                                                               |  20%
| Since the dataset is stored in a data frame, we know it is rectangular. In other
| words, it has two dimensions (rows and columns) and fits neatly into a table or
| spreadsheet. Use dim(plants) to see exactly how many rows and columns we're dealing
| with.

> dim(plants)
[1] 5166   10

| You are doing so well!

  |===================                                                            |  24%
| The first number you see (5166) is the number of rows (observations) and the second
| number (10) is the number of columns (variables).

...

  |======================                                                         |  28%
| You can also use nrow(plants) to see only the number of rows. Try it out.

> nrow(plants)
[1] 5166

| Nice work!

  |=========================                                                      |  32%
| ... And ncol(plants) to see only the number of columns.

> ncol(plants)
[1] 10

| Perseverance, that's the answer.

  |============================                                                   |  36%
| If you are curious as to how much space the dataset is occupying in memory, you can
| use object.size(plants).

> object.size(plants)
686080 bytes

| You are quite good my friend!

  |================================                                               |  40%
| Now that we have a sense of the shape and size of the dataset, let's get a feel for
| what's inside. names(plants) will return a character vector of column (i.e. variable)
| names. Give it a shot.

> names(plants)
 [1] "Scientific_Name"      "Duration"             "Active_Growth_Period"
 [4] "Foliage_Color"        "pH_Min"               "pH_Max"              
 [7] "Precip_Min"           "Precip_Max"           "Shade_Tolerance"     
[10] "Temp_Min_F"          

| Excellent job!

  |===================================                                            |  44%
| We've applied fairly descriptive variable names to this dataset, but that won't always
| be the case. A logical next step is to peek at the actual data. However, our dataset
| contains over 5000 observations (rows), so it's impractical to view the whole thing
| all at once.

...

  |======================================                                         |  48%
| The head() function allows you to preview the top of the dataset. Give it a try with
| only one argument.

> head(plants)
               Scientific_Name          Duration Active_Growth_Period Foliage_Color
1                  Abelmoschus              <NA>                 <NA>          <NA>
2       Abelmoschus esculentus Annual, Perennial                 <NA>          <NA>
3                        Abies              <NA>                 <NA>          <NA>
4               Abies balsamea         Perennial    Spring and Summer         Green
5 Abies balsamea var. balsamea         Perennial                 <NA>          <NA>
6                     Abutilon              <NA>                 <NA>          <NA>
  pH_Min pH_Max Precip_Min Precip_Max Shade_Tolerance Temp_Min_F
1     NA     NA         NA         NA            <NA>         NA
2     NA     NA         NA         NA            <NA>         NA
3     NA     NA         NA         NA            <NA>         NA
4      4      6         13         60        Tolerant        -43
5     NA     NA         NA         NA            <NA>         NA
6     NA     NA         NA         NA            <NA>         NA

| Excellent work!

  |=========================================                                      |  52%
| Take a minute to look through and understand the output above. Each row is labeled
| with the observation number and each column with the variable name. Your screen is
| probably not wide enough to view all 10 columns side-by-side, in which case R displays
| as many columns as it can on each line before continuing on the next.

...

  |============================================                                   |  56%
| By default, head() shows you the first six rows of the data. You can alter this
| behavior by passing as a second argument the number of rows you'd like to view. Use
| head() to preview the first 10 rows of plants.

> head(plants, 10)
                     Scientific_Name          Duration Active_Growth_Period
1                        Abelmoschus              <NA>                 <NA>
2             Abelmoschus esculentus Annual, Perennial                 <NA>
3                              Abies              <NA>                 <NA>
4                     Abies balsamea         Perennial    Spring and Summer
5       Abies balsamea var. balsamea         Perennial                 <NA>
6                           Abutilon              <NA>                 <NA>
7               Abutilon theophrasti            Annual                 <NA>
8                             Acacia              <NA>                 <NA>
9                  Acacia constricta         Perennial    Spring and Summer
10 Acacia constricta var. constricta         Perennial                 <NA>
   Foliage_Color pH_Min pH_Max Precip_Min Precip_Max Shade_Tolerance Temp_Min_F
1           <NA>     NA     NA         NA         NA            <NA>         NA
2           <NA>     NA     NA         NA         NA            <NA>         NA
3           <NA>     NA     NA         NA         NA            <NA>         NA
4          Green      4    6.0         13         60        Tolerant        -43
5           <NA>     NA     NA         NA         NA            <NA>         NA
6           <NA>     NA     NA         NA         NA            <NA>         NA
7           <NA>     NA     NA         NA         NA            <NA>         NA
8           <NA>     NA     NA         NA         NA            <NA>         NA
9          Green      7    8.5          4         20      Intolerant        -13
10          <NA>     NA     NA         NA         NA            <NA>         NA

| That's correct!

  |===============================================                                |  60%
| The same applies for using tail() to preview the end of the dataset. Use tail() to
| view the last 15 rows.

> tail(plants, 15)
                      Scientific_Name  Duration Active_Growth_Period Foliage_Color
5152                          Zizania      <NA>                 <NA>          <NA>
5153                 Zizania aquatica    Annual               Spring         Green
5154   Zizania aquatica var. aquatica    Annual                 <NA>          <NA>
5155                Zizania palustris    Annual                 <NA>          <NA>
5156 Zizania palustris var. palustris    Annual                 <NA>          <NA>
5157                      Zizaniopsis      <NA>                 <NA>          <NA>
5158             Zizaniopsis miliacea Perennial    Spring and Summer         Green
5159                            Zizia      <NA>                 <NA>          <NA>
5160                     Zizia aptera Perennial                 <NA>          <NA>
5161                      Zizia aurea Perennial                 <NA>          <NA>
5162                 Zizia trifoliata Perennial                 <NA>          <NA>
5163                          Zostera      <NA>                 <NA>          <NA>
5164                   Zostera marina Perennial                 <NA>          <NA>
5165                           Zoysia      <NA>                 <NA>          <NA>
5166                  Zoysia japonica Perennial                 <NA>          <NA>
     pH_Min pH_Max Precip_Min Precip_Max Shade_Tolerance Temp_Min_F
5152     NA     NA         NA         NA            <NA>         NA
5153    6.4    7.4         30         50      Intolerant         32
5154     NA     NA         NA         NA            <NA>         NA
5155     NA     NA         NA         NA            <NA>         NA
5156     NA     NA         NA         NA            <NA>         NA
5157     NA     NA         NA         NA            <NA>         NA
5158    4.3    9.0         35         70      Intolerant         12
5159     NA     NA         NA         NA            <NA>         NA
5160     NA     NA         NA         NA            <NA>         NA
5161     NA     NA         NA         NA            <NA>         NA
5162     NA     NA         NA         NA            <NA>         NA
5163     NA     NA         NA         NA            <NA>         NA
5164     NA     NA         NA         NA            <NA>         NA
5165     NA     NA         NA         NA            <NA>         NA
5166     NA     NA         NA         NA            <NA>         NA

| Excellent work!

  |===================================================                            |  64%
| After previewing the top and bottom of the data, you probably noticed lots of NAs,
| which are R's placeholders for missing values. Use summary(plants) to get a better
| feel for how each variable is distributed and how much of the dataset is missing.

> summary(plants)
                     Scientific_Name              Duration   
 Abelmoschus                 :   1   Perennial        :3031  
 Abelmoschus esculentus      :   1   Annual           : 682  
 Abies                       :   1   Annual, Perennial: 179  
 Abies balsamea              :   1   Annual, Biennial :  95  
 Abies balsamea var. balsamea:   1   Biennial         :  57  
 Abutilon                    :   1   (Other)          :  92  
 (Other)                     :5160   NA's             :1030  
           Active_Growth_Period      Foliage_Color      pH_Min          pH_Max      
 Spring and Summer   : 447      Dark Green  :  82   Min.   :3.000   Min.   : 5.100  
 Spring              : 144      Gray-Green  :  25   1st Qu.:4.500   1st Qu.: 7.000  
 Spring, Summer, Fall:  95      Green       : 692   Median :5.000   Median : 7.300  
 Summer              :  92      Red         :   4   Mean   :4.997   Mean   : 7.344  
 Summer and Fall     :  24      White-Gray  :   9   3rd Qu.:5.500   3rd Qu.: 7.800  
 (Other)             :  30      Yellow-Green:  20   Max.   :7.000   Max.   :10.000  
 NA's                :4334      NA's        :4334   NA's   :4327    NA's   :4327    
   Precip_Min      Precip_Max         Shade_Tolerance   Temp_Min_F    
 Min.   : 4.00   Min.   : 16.00   Intermediate: 242   Min.   :-79.00  
 1st Qu.:16.75   1st Qu.: 55.00   Intolerant  : 349   1st Qu.:-38.00  
 Median :28.00   Median : 60.00   Tolerant    : 246   Median :-33.00  
 Mean   :25.57   Mean   : 58.73   NA's        :4329   Mean   :-22.53  
 3rd Qu.:32.00   3rd Qu.: 60.00                       3rd Qu.:-18.00  
 Max.   :60.00   Max.   :200.00                       Max.   : 52.00  
 NA's   :4338    NA's   :4338                         NA's   :4328    

| That's a job well done!

  |======================================================                         |  68%
| summary() provides different output for each variable, depending on its class. For
| numeric data such as Precip_Min, summary() displays the minimum, 1st quartile, median,
| mean, 3rd quartile, and maximum. These values help us understand how the data are
| distributed.

...

  |=========================================================                      |  72%
| For categorical variables (called 'factor' variables in R), summary() displays the
| number of times each value (or 'level') occurs in the data. For example, each value of
| Scientific_Name only appears once, since it is unique to a specific plant. In
| contrast, the summary for Duration (also a factor variable) tells us that our dataset
| contains 3031 Perennial plants, 682 Annual plants, etc.

...

  |============================================================                   |  76%
| You can see that R truncated the summary for Active_Growth_Period by including a
| catch-all category called 'Other'. Since it is a categorical/factor variable, we can
| see how many times each value actually occurs in the data with
| table(plants$Active_Growth_Period).

> table(plants$Active_Growth_Period)

Fall, Winter and Spring                  Spring         Spring and Fall 
                     15                     144                      10 
      Spring and Summer    Spring, Summer, Fall                  Summer 
                    447                      95                      92 
        Summer and Fall              Year Round 
                     24                       5 

| All that practice is paying off!

  |===============================================================                |  80%
| Each of the functions we've introduced so far has its place in helping you to better
| understand the structure of your data. However, we've left the best for last....

...

  |==================================================================             |  84%
| Perhaps the most useful and concise function for understanding the *str*ucture of your
| data is str(). Give it a try now.

> str(plants)
'data.frame':   5166 obs. of  10 variables:
 $ Scientific_Name     : Factor w/ 5166 levels "Abelmoschus",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Duration            : Factor w/ 8 levels "Annual","Annual, Biennial",..: NA 4 NA 7 7 NA 1 NA 7 7 ...
 $ Active_Growth_Period: Factor w/ 8 levels "Fall, Winter and Spring",..: NA NA NA 4 NA NA NA NA 4 NA ...
 $ Foliage_Color       : Factor w/ 6 levels "Dark Green","Gray-Green",..: NA NA NA 3 NA NA NA NA 3 NA ...
 $ pH_Min              : num  NA NA NA 4 NA NA NA NA 7 NA ...
 $ pH_Max              : num  NA NA NA 6 NA NA NA NA 8.5 NA ...
 $ Precip_Min          : int  NA NA NA 13 NA NA NA NA 4 NA ...
 $ Precip_Max          : int  NA NA NA 60 NA NA NA NA 20 NA ...
 $ Shade_Tolerance     : Factor w/ 3 levels "Intermediate",..: NA NA NA 3 NA NA NA NA 2 NA ...
 $ Temp_Min_F          : int  NA NA NA -43 NA NA NA NA -13 NA ...

| You are quite good my friend!

  |======================================================================         |  88%
| The beauty of str() is that it combines many of the features of the other functions
| you've already seen, all in a concise and readable format. At the very top, it tells
| us that the class of plants is 'data.frame' and that it has 5166 observations and 10
| variables. It then gives us the name and class of each variable, as well as a preview
| of its contents.

...

  |=========================================================================      |  92%
| str() is actually a very general function that you can use on most objects in R. Any
| time you want to understand the structure of something (a dataset, function, etc.),
| str() is a good place to start.

...

  |============================================================================   |  96%
| In this lesson, you learned how to get a feel for the structure and contents of a new
| dataset using a collection of simple and useful functions. Taking the time to do this
| upfront can save you time and frustration later on in your analysis.

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Generate Code
2: No
3: Yes

Selection: 2

| You are amazing!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 11 終了。

6.4.12 Lesson 12: Data Manipulation

dplyr Package と、‘tidyr’ Package の活用について学びます。一群の、tidyverse Package に含まれます。ここでは、Data Manipulation と呼ばれていますが、Data Munging とか、Data Wrangling と呼ばれている、Data Science には、不可欠な、Data の整形です。

  • Tidyverse: R packages for data science
  • dplyr Package
    • dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:
    • mutate() adds new variables that are functions of existing variable
    • select() picks variables based on their names
    • filter() picks cases based on their values
    • summarise() reduces multiple values down to a single summary
    • arrange() changes the ordering of the rows.
  • tidyr Package
    • tidyr provides a set of functions that help you get to tidy data. Tidy data is data with a consistent form: in brief, every variable goes in a column, and every column is a variable.
  • Tidy data
  • このパッケージを利用するためには、library(tidyr) および library(dplyr) または library(tidyverse) とします。

Lesson 12 は、少し、難しいので、Help を活用してください。まったく、不明の場合は、Package の概要を学んでから、試すか、skip() して、答えを確認してください。

> swirl()

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you
| did then. If you are new, call yourself something unique.

What shall I call you? SL

| Would you like to continue with one of these lessons?

1: R Programming Basic Building Blocks
2: No. Let me start something new.

Selection: 2

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: The R Programming Environment
3: Take me to the swirl course repository!

Selection: 2

| Please choose a lesson, or type 0 to return to course menu.

 1: Setting Up Swirl              2: Basic Building Blocks      
 3: Sequences of Numbers          4: Vectors                    
 5: Missing Values                6: Subsetting Vectors         
 7: Matrices and Data Frames      8: Logic                      
 9: Workspace and Files          10: Reading Tabular Data       
11: Looking at Data              12: Data Manipulation          
13: Text Manipulation Functions  14: Regular Expressions        
15: The stringr Package          

Selection: 12

| Attempting to load lesson dependencies...

| Package ‘dplyr’ loaded correctly!

| Package ‘tidyr’ loaded correctly!

| This lesson requires the ‘titanic’ package. Would you like me to install it for you
| now?

1: Yes
2: No

Selection: 1

| Trying to install package ‘titanic’ now...

| Package ‘titanic’ loaded correctly!

| This lesson requires the ‘faraway’ package. Would you like me to install it for you
| now?

1: Yes
2: No

Selection: 1

| Trying to install package ‘faraway’ now...
also installing the dependencies ‘minqa’, ‘nloptr’, ‘statmod’, ‘RcppEigen’, ‘lme4’


| Package ‘faraway’ loaded correctly!

  |                                                                               |   0%

| I just opened an R script for you, and I have loaded a dataset called `worldcup`. For
| this question, you will be adding a function from `dplyr` or `tidyr` to the pipe chain
| in the script. Your goal is to re-create the example output shown in the comments of
| the script. When you are ready to submit your script, save the script and type
| `submit()`, or type `reset()` to reset the script to its original state.

step1_wc.R の内容

# Add a `dplyr` or `tidyr` function to the pipe chain in the code 
# at the bottom of this script to subset the `worldcup` dataset to 
# four columns, so that the first lines of the resulting data 
# frame (`wc_1`) look like this: 
#
##           Time   Passes  Tackles Saves
## Abdoun      16        6        0     0
## Abe        351      101       14     0
## Abidal     180       91        6     0
## Abou Diaby 270       111       5     0
#
# I have already loaded the `worldcup` data frame for you, so you 
# can explore it and test out your code in the console.
#
# When you are ready submit your answer, save the script and type 
# submit(), or type reset() to reset the script to its original 
# state. 

wc_1 <- worldcup %>% 

指示にもあるように、worlcup データフレームの内容を確認します。

> str(worldcup)
'data.frame':   595 obs. of  7 variables:
 $ Team    : Factor w/ 32 levels "Algeria","Argentina",..: 1 16 9 9 5 32 11 11 18 20 ...
 $ Position: Factor w/ 4 levels "Defender","Forward",..: 4 4 1 4 2 2 1 2 4 1 ...
 $ Time    : int  16 351 180 270 46 72 138 33 21 103 ...
 $ Shots   : int  0 0 0 1 2 0 0 0 5 0 ...
 $ Passes  : int  6 101 91 111 16 15 51 9 22 38 ...
 $ Tackles : int  0 14 6 5 0 0 2 0 0 1 ...
 $ Saves   : int  0 0 0 0 0 0 0 0 0 0 ...
> head(worldcup)
               Team   Position Time Shots Passes Tackles Saves
Abdoun      Algeria Midfielder   16     0      6       0     0
Abe           Japan Midfielder  351     0    101      14     0
Abidal       France   Defender  180     0     91       6     0
Abou Diaby   France Midfielder  270     1    111       5     0
Aboubakar  Cameroon    Forward   46     2     16       0     0
Abreu       Uruguay    Forward   72     0     15       0     0

内容の概略がわかったら、最後の部分に一行付け加えて、次のようにします。

step1_wc.R の編集後の内容

# Add a `dplyr` or `tidyr` function to the pipe chain in the code 
# at the bottom of this script to subset the `worldcup` dataset to 
# four columns, so that the first lines of the resulting data 
# frame (`wc_1`) look like this: 
#
##           Time   Passes  Tackles Saves
## Abdoun      16        6        0     0
## Abe        351      101       14     0
## Abidal     180       91        6     0
## Abou Diaby 270       111       5     0
#
# I have already loaded the `worldcup` data frame for you, so you 
# can explore it and test out your code in the console.
#
# When you are ready submit your answer, save the script and type 
# submit(), or type reset() to reset the script to its original 
# state. 

wc_1 <- worldcup %>% 
  select(Time, Passes, Tackles, Saves)

最後の行を付け加えました。編集が終了したら、step1_wc.R を save し、submit() と、Console に入力します。

> submit()

| Sourcing your script...


| Your dedication is inspiring!

  |==========                                                                     |  12%
| In this question, you will continue to build on the data cleaning you started in the
| last question. I opened a new R script for you with the previous question completed.
| As a next step, you now need to reduce the data frame to show the mean of each column
| selected in the previous step. To do this, add a function from `dplyr` or `tidyr` to
| the pipe chain in the script. Your goal is to re-create the example output shown in
| the comments of the script. When you are ready to submit your script, save the script
| and type `submit()`, or type `reset()` to reset the script to its original state.

step2_wc.R の編集前の内容

# After the previous question, you should have transformed the 
# `worldcup` data to look like this:
#
##           Time   Passes  Tackles Saves
## Abdoun      16        6        0     0
## Abe        351      101       14     0
## Abidal     180       91        6     0
## Abou Diaby 270       111       5     0
#
# Now add a `dplyr` or `tidyr` function to the pipe chain in the 
# code at the bottom of this script so that `wc_2` data frame looks 
# like this, with four columns, and a single observation (the 
# mean value of each variable): 
#
##       Time   Passes  Tackles     Saves
##   208.8639 84.52101 4.191597 0.6672269
#
# I have already loaded the `worldcup` data frame for you, so you 
# can explore it and test out your code in the console.
#
# When you are ready submit your answer, save the script and type 
# submit(), or type reset() to reset the script to its original 
# state. 

wc_2 <- worldcup %>% 
  select(Time, Passes, Tackles, Saves) %>%

最後に追加したものが実際に、指定された出力のようになるか確認します。

> wc_2 <- worldcup %>% 
+     select(Time, Passes, Tackles, Saves) %>%
+     summarize(Time = mean(Time), 
+               Passes = mean(Passes),
+               Tackles = mean(Tackles),
+               Saves = mean(Saves))
> wc_2
      Time   Passes  Tackles     Saves
1 208.8639 84.52101 4.191597 0.6672269

step2_wc.R の編集後の内容

# After the previous question, you should have transformed the 
# `worldcup` data to look like this:
#
##           Time   Passes  Tackles Saves
## Abdoun      16        6        0     0
## Abe        351      101       14     0
## Abidal     180       91        6     0
## Abou Diaby 270       111       5     0
#
# Now add a `dplyr` or `tidyr` function to the pipe chain in the 
# code at the bottom of this script so that `wc_2` data frame looks 
# like this, with four columns, and a single observation (the 
# mean value of each variable): 
#
##       Time   Passes  Tackles     Saves
##   208.8639 84.52101 4.191597 0.6672269
#
# I have already loaded the `worldcup` data frame for you, so you 
# can explore it and test out your code in the console.
#
# When you are ready submit your answer, save the script and type 
# submit(), or type reset() to reset the script to its original 
# state. 

wc_2 <- worldcup %>% 
  select(Time, Passes, Tackles, Saves) %>%
  summarize(Time = mean(Time), 
            Passes = mean(Passes),
            Tackles = mean(Tackles),
            Saves = mean(Saves))

summarize 以下の追加部分は、つぎのようにしても、問題ありません。code の部分を書きます。

wc_2 <- worldcup %>% 
  select(Time, Passes, Tackles, Saves) %>%
  summarize_all(mean)

編集が終了したら、step2_wc.R を save して、submit() です。

> submit()

| Sourcing your script...


| Nice work!

  |====================                                                           |  25%
| In this question, you will again continue to build on the data cleaning you started in
| the last two questions. I opened a new R script for you with the previous steps
| completed. As a next step, you now need to rearrange the data frame to have one column
| with each variable's name and one column with mean values for each variable. To do
| this, add a function from `dplyr` or `tidyr` to the pipe chain in the script. Your
| goal is to re-create the example output shown in the comments of the script. When you
| are ready to submit your script, save the script and type `submit()`, or type
| `reset()` to reset the script to its original state.

step3_wc.R の編集前の内容

# After the previous question, you should have transformed the 
# `worldcup` data to look like this:
#
##       Time   Passes  Tackles     Saves
##   208.8639 84.52101 4.191597 0.6672269
#
# Now add a `dplyr` or `tidyr` function to the pipe chain in the 
# code at the bottom of this script so that the `wc_3` data frame looks 
# like this, with variable names in one column and the mean value
# of each variable in another column: 
#
##      var           mean
##     Time    208.8638655
##   Passes     84.5210084
##  Tackles      4.1915966
##    Saves      0.6672269
#
# I have already loaded the `worldcup` data frame for you, so you 
# can explore it and test out your code in the console.
#
# When you are ready submit your answer, save the script and type 
# submit(), or type reset() to reset the script to its original 
# state. 

wc_3 <- worldcup %>% 
  select(Time, Passes, Tackles, Saves) %>%
  summarize(Time = mean(Time),
            Passes = mean(Passes),
            Tackles = mean(Tackles),
            Saves = mean(Saves)) %>%

最後に追加したものが実際に、指定された出力のようになるか確認します。

> wc_3 <- worldcup %>% 
+     select(Time, Passes, Tackles, Saves) %>%
+     summarize(Time = mean(Time),
+               Passes = mean(Passes),
+               Tackles = mean(Tackles),
+               Saves = mean(Saves)) %>%
+     gather(key = "var", value = "mean")
> wc_3
      var        mean
1    Time 208.8638655
2  Passes  84.5210084
3 Tackles   4.1915966
4   Saves   0.6672269

step3_wc.R の編集後の内容

# After the previous question, you should have transformed the 
# `worldcup` data to look like this:
#
##       Time   Passes  Tackles     Saves
##   208.8639 84.52101 4.191597 0.6672269
#
# Now add a `dplyr` or `tidyr` function to the pipe chain in the 
# code at the bottom of this script so that the `wc_3` data frame looks 
# like this, with variable names in one column and the mean value
# of each variable in another column: 
#
##      var           mean
##     Time    208.8638655
##   Passes     84.5210084
##  Tackles      4.1915966
##    Saves      0.6672269
#
# I have already loaded the `worldcup` data frame for you, so you 
# can explore it and test out your code in the console.
#
# When you are ready submit your answer, save the script and type 
# submit(), or type reset() to reset the script to its original 
# state. 

wc_3 <- worldcup %>% 
  select(Time, Passes, Tackles, Saves) %>%
  summarize(Time = mean(Time),
            Passes = mean(Passes),
            Tackles = mean(Tackles),
            Saves = mean(Saves)) %>%
  gather(key = "var", value = "mean")

編集が終了したら、step3_wc.R を save して、submit() です。

> submit()

| Sourcing your script...


| You are quite good my friend!

  |==============================                                                 |  38%
| In this question, you will continue to build on the data cleaning in the last three
| questions. I opened a new R script for you with the previous steps completed. As a
| next step, you now round the mean values to one decimal point. To do this, add a
| function from `dplyr` or `tidyr` to the pipe chain in the script. Your goal is to
| re-create the example output shown in the comments of the script. When you are ready
| to submit your script, save the script and type `submit()`, or type `reset()` to reset
| the script to its original state.

step4_wc.R の編集前の内容

# After the previous question, you should have transformed the `worldcup`
# data to look like this:
#
##      var           mean
##     Time    208.8638655
##   Passes     84.5210084
##  Tackles      4.1915966
##    Saves      0.6672269
#
# Now add a `dplyr` or `tidyr` function to the pipe chain in the 
# code at the bottom of this script so that the `wc_4` data frame looks 
# like this, with variable means rounded to one decimal place: 
#
##      var     mean
##     Time    208.9
##   Passes     84.5
##  Tackles      4.2
##    Saves      0.7
#
# I have already loaded the `worldcup` data frame for you, so you 
# can explore it and test out your code in the console.
#
# When you are ready submit your answer, save the script and type 
# submit(), or type reset() to reset the script to its original 
# state. 

wc_4 <- worldcup %>% 
  select(Time, Passes, Tackles, Saves) %>%
  summarize(Time = mean(Time),
            Passes = mean(Passes),
            Tackles = mean(Tackles),
            Saves = mean(Saves)) %>%
  gather(var, mean) %>%

最後に追加したものが実際に、指定された出力のようになるか確認します。

> wc_4 <- worldcup %>% 
+     select(Time, Passes, Tackles, Saves) %>%
+     summarize(Time = mean(Time),
+               Passes = mean(Passes),
+               Tackles = mean(Tackles),
+               Saves = mean(Saves)) %>%
+     gather(var, mean) %>%
+     mutate(mean = round(mean, digits = 1))
> wc_4
      var  mean
1    Time 208.9
2  Passes  84.5
3 Tackles   4.2
4   Saves   0.7

step4_wc.R の編集後の内容

# After the previous question, you should have transformed the `worldcup`
# data to look like this:
#
##      var           mean
##     Time    208.8638655
##   Passes     84.5210084
##  Tackles      4.1915966
##    Saves      0.6672269
#
# Now add a `dplyr` or `tidyr` function to the pipe chain in the 
# code at the bottom of this script so that the `wc_4` data frame looks 
# like this, with variable means rounded to one decimal place: 
#
##      var     mean
##     Time    208.9
##   Passes     84.5
##  Tackles      4.2
##    Saves      0.7
#
# I have already loaded the `worldcup` data frame for you, so you 
# can explore it and test out your code in the console.
#
# When you are ready submit your answer, save the script and type 
# submit(), or type reset() to reset the script to its original 
# state. 

wc_4 <- worldcup %>% 
  select(Time, Passes, Tackles, Saves) %>%
  summarize(Time = mean(Time),
            Passes = mean(Passes),
            Tackles = mean(Tackles),
            Saves = mean(Saves)) %>%
  gather(var, mean) %>%
  mutate(mean = round(mean, digits = 1))

編集が終了したら、step4_wc.R を save して、submit() です。

> submit()

| Sourcing your script...

| You are amazing!

  |========================================                                       |  50%
| In this question, we'll be doing similar data cleaning with a new dataset. I have
| loaded a dataset called `titanic` for you. For this question, you need to subset the
| original dataframe down to four columns. Your goal is to re-create the example output
| shown in the comments of the script by adding a `dplyr` or `tidyr` function to the
| pipe chain.  When you are ready to submit your script, save the script and type
| `submit()`, or type `reset()` to reset the script to its original state.

step1_titanic.R の編集前の内容

# Add a `dplyr` or `tidyr` function to the pipe chain in the code 
# at the bottom of this script to subset the `titanic` dataset to 
# four columns, so that the first lines of the resulting data 
# frame (`titanic_1`) look like this: 
#
##  Survived  Pclass   Age      Sex
##         0       3    22     male
##         1       1    38   female
##         1       3    26   female
##         1       1    35   female
##         0       3    35     male
##         0       3    NA     male
#
# I have already loaded the `titanic` data frame for you, so you 
# can explore it and test out your code in the console.
#
# When you are ready submit your answer, save the script and type 
# submit(), or type reset() to reset the script to its original 
# state. 

titanic_1 <- titanic %>% 

指示にもあるように、titanic データフレームの内容を確認します。

> str(titanic)
'data.frame':   891 obs. of  12 variables:
 $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
 $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ...
 $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ...
 $ Name       : chr  "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
 $ Sex        : chr  "male" "female" "female" "female" ...
 $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
 $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
 $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
 $ Ticket     : chr  "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
 $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
 $ Cabin      : chr  "" "C85" "" "C123" ...
 $ Embarked   : chr  "S" "C" "S" "S" ...
> head(titanic)
  PassengerId Survived Pclass                                                Name    Sex
1           1        0      3                             Braund, Mr. Owen Harris   male
2           2        1      1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female
3           3        1      3                              Heikkinen, Miss. Laina female
4           4        1      1        Futrelle, Mrs. Jacques Heath (Lily May Peel) female
5           5        0      3                            Allen, Mr. William Henry   male
6           6        0      3                                    Moran, Mr. James   male
  Age SibSp Parch           Ticket    Fare Cabin Embarked
1  22     1     0        A/5 21171  7.2500              S
2  38     1     0         PC 17599 71.2833   C85        C
3  26     0     0 STON/O2. 3101282  7.9250              S
4  35     1     0           113803 53.1000  C123        S
5  35     0     0           373450  8.0500              S
6  NA     0     0           330877  8.4583              Q
> titanic_1 <- titanic %>% 
+     select(Survived, Pclass, Age, Sex)
> head(titanic_1)
  Survived Pclass Age    Sex
1        0      3  22   male
2        1      1  38 female
3        1      3  26 female
4        1      1  35 female
5        0      3  35   male
6        0      3  NA   male

内容の概略がわかったら、最後の部分に一行付け加えて、次のようにします。

step1_titanic.R の編集後の内容

# Add a `dplyr` or `tidyr` function to the pipe chain in the code 
# at the bottom of this script to subset the `titanic` dataset to 
# four columns, so that the first lines of the resulting data 
# frame (`titanic_1`) look like this: 
#
##  Survived  Pclass   Age      Sex
##         0       3    22     male
##         1       1    38   female
##         1       3    26   female
##         1       1    35   female
##         0       3    35     male
##         0       3    NA     male
#
# I have already loaded the `titanic` data frame for you, so you 
# can explore it and test out your code in the console.
#
# When you are ready submit your answer, save the script and type 
# submit(), or type reset() to reset the script to its original 
# state. 

titanic_1 <- titanic %>% 
  select(Survived, Pclass, Age, Sex)

最後の行を付け加えました。編集が終了したら、step1_titanic.R を save し、submit() と、Console に入力します。

> submit()

| Sourcing your script...

| That's correct!

  |=================================================                              |  62%
| In this question, you will continue to build on the data cleaning you started in the
| last question. I opened a new R script for you with the previous question completed.
| As a next step, you now need to remove any observations that are missing a value for
| Age. To do this, add a function from `dplyr` or `tidyr` to the pipe chain in the
| script. Your goal is to re-create the example output shown in the comments of the
| script. When you are ready to submit your script, save the script and type `submit()`,
| or type `reset()` to reset the script to its original state.

step2_titanic.R の編集前の内容

# After the previous question, you should have transformed the 
# `titanic` data to look like this:
#
##  Survived  Pclass   Age      Sex
##         0       3    22     male
##         1       1    38   female
##         1       3    26   female
##         1       1    35   female
##         0       3    35     male
##         0       3    NA     male
#
# Now add a `dplyr` or `tidyr` function to the pipe chain in the 
# code at the bottom of this script so that `titanic_2` data frame 
# looks like this, with observations where "Age" was missing removed: 
#
##  Survived  Pclass   Age      Sex
##         0       3    22     male
##         1       1    38   female
##         1       3    26   female
##         1       1    35   female
##         0       3    35     male
##         0       1    54     male
#
# I have already loaded the `titanic` data frame for you, so you 
# can explore it and test out your code in the console.
#
# When you are ready submit your answer, save the script and type 
# submit(), or type reset() to reset the script to its original 
# state. 

titanic_2 <- titanic %>% 
  select(Survived, Pclass, Age, Sex) %>%

最後に追加したものが実際に、指定された出力のようになるか確認します。

> titanic_2 <- titanic %>% 
+     select(Survived, Pclass, Age, Sex) %>%
+     filter(!is.na(Age))
> head(titanic_2)
  Survived Pclass Age    Sex
1        0      3  22   male
2        1      1  38 female
3        1      3  26 female
4        1      1  35 female
5        0      3  35   male
6        0      1  54   male

step2_titanic.R の編集後の内容

# After the previous question, you should have transformed the 
# `titanic` data to look like this:
#
##  Survived  Pclass   Age      Sex
##         0       3    22     male
##         1       1    38   female
##         1       3    26   female
##         1       1    35   female
##         0       3    35     male
##         0       3    NA     male
#
# Now add a `dplyr` or `tidyr` function to the pipe chain in the 
# code at the bottom of this script so that `titanic_2` data frame 
# looks like this, with observations where "Age" was missing removed: 
#
##  Survived  Pclass   Age      Sex
##         0       3    22     male
##         1       1    38   female
##         1       3    26   female
##         1       1    35   female
##         0       3    35     male
##         0       1    54     male
#
# I have already loaded the `titanic` data frame for you, so you 
# can explore it and test out your code in the console.
#
# When you are ready submit your answer, save the script and type 
# submit(), or type reset() to reset the script to its original 
# state. 

titanic_2 <- titanic %>% 
  select(Survived, Pclass, Age, Sex) %>%
  filter(!is.na(Age))

最後の一行を加えました。submit()

> submit()

| Sourcing your script...

| You nailed it! Good job!

  |===========================================================                    |  75%
| In this question, you will again continue to build on the data cleaning you started in
| the last two questions. I opened a new R script for you with the previous steps
| completed. As a next step, you now need to create a new column called agecat that
| divides a person's age into three broad categories (Under 15, 15 to 50, Over 50). To
| do this, add a function from `dplyr` or `tidyr` to the pipe chain in the script. Your
| goal is to re-create the example output shown in the comments of the script. When you
| are ready to submit your script, save the script and type `submit()`, or type
| `reset()` to reset the script to its original state.

step3_titanic.R の編集前の内容

# After the previous question, you should have transformed the 
# `titanic` data to look like this:
#
##  Survived  Pclass   Age      Sex
##         0       3    22     male
##         1       1    38   female
##         1       3    26   female
##         1       1    35   female
##         0       3    35     male
##         0       1    54     male
#
# Now add a `dplyr` or `tidyr` function to the pipe chain in the 
# code at the bottom of this script so that `titanic_2` data frame 
# looks like this, with a new column added that specifies whether 
# the person's age is under 15, from 15 to 50, or over 50
# (this column should have a factor class, with factor levels ordered
# by age-- "Under 15", "15 to 50", "Over 50"): 
#
##   Survived Pclass   Age     Sex      agecat
##          0      3    22    male    15 to 50
##          1      1    38  female    15 to 50
##          1      3    26  female    15 to 50
##          1      1    35  female    15 to 50
##          0      3    35    male    15 to 50
##          0      1    54    male     Over 50
#
# I have already loaded the `titanic` data frame for you, so you 
# can explore it and test out your code in the console.
#
# When you are ready submit your answer, save the script and type 
# submit(), or type reset() to reset the script to its original 
# state. 

titanic_3 <- titanic %>% 
  select(Survived, Pclass, Age, Sex) %>%
  filter(!is.na(Age)) %>%

最後に追加したものが実際に、指定された出力のようになるか確認します。 ここでは、cut を使っています。(cut については、?cut と Help を確認してください)ここでは、つぎのような、区間に分けています。(0, 14.99],(14.99, 50],(50, 150]。年齢にどのような種類かを確認するために、unique(titanic$Age) を使っています。

> unique(titanic$Age)
 [1] 22.00 38.00 26.00 35.00    NA 54.00  2.00 27.00 14.00  4.00 58.00 20.00 39.00 55.00
[15] 31.00 34.00 15.00 28.00  8.00 19.00 40.00 66.00 42.00 21.00 18.00  3.00  7.00 49.00
[29] 29.00 65.00 28.50  5.00 11.00 45.00 17.00 32.00 16.00 25.00  0.83 30.00 33.00 23.00
[43] 24.00 46.00 59.00 71.00 37.00 47.00 14.50 70.50 32.50 12.00  9.00 36.50 51.00 55.50
[57] 40.50 44.00  1.00 61.00 56.00 50.00 36.00 45.50 20.50 62.00 41.00 52.00 63.00 23.50
[71]  0.92 43.00 60.00 10.00 64.00 13.00 48.00  0.75 53.00 57.00 80.00 70.00 24.50  6.00
[85]  0.67 30.50  0.42 34.50 74.00
> titanic_3 <- titanic %>% 
+     select(Survived, Pclass, Age, Sex) %>%
+     filter(!is.na(Age)) %>%
+     mutate(agecat = cut(Age, breaks = c(0, 14.99, 50, 150), 
+                         include.lowest = TRUE,
+                         labels = c("Under 15", "15 to 50", "Over 50")))
> head(titanic_3)
  Survived Pclass Age    Sex   agecat
1        0      3  22   male 15 to 50
2        1      1  38 female 15 to 50
3        1      3  26 female 15 to 50
4        1      1  35 female 15 to 50
5        0      3  35   male 15 to 50
6        0      1  54   male  Over 50
> titanic_3 <- titanic %>% 
+     select(Survived, Pclass, Age, Sex) %>%
+     filter(!is.na(Age)) %>%
+     mutate(agecat = cut(Age, breaks = c(-1, 14, 50, 150), 
+                         include.lowest = TRUE,
+                         labels = c("Under 15", "15 to 50", "Over 50")))

step3_titanic.R の編集後の内容

# After the previous question, you should have transformed the 
# `titanic` data to look like this:
#
##  Survived  Pclass   Age      Sex
##         0       3    22     male
##         1       1    38   female
##         1       3    26   female
##         1       1    35   female
##         0       3    35     male
##         0       1    54     male
#
# Now add a `dplyr` or `tidyr` function to the pipe chain in the 
# code at the bottom of this script so that `titanic_2` data frame 
# looks like this, with a new column added that specifies whether 
# the person's age is under 15, from 15 to 50, or over 50
# (this column should have a factor class, with factor levels ordered
# by age-- "Under 15", "15 to 50", "Over 50"): 
#
##   Survived Pclass   Age     Sex      agecat
##          0      3    22    male    15 to 50
##          1      1    38  female    15 to 50
##          1      3    26  female    15 to 50
##          1      1    35  female    15 to 50
##          0      3    35    male    15 to 50
##          0      1    54    male     Over 50
#
# I have already loaded the `titanic` data frame for you, so you 
# can explore it and test out your code in the console.
#
# When you are ready submit your answer, save the script and type 
# submit(), or type reset() to reset the script to its original 
# state. 

titanic_3 <- titanic %>% 
  select(Survived, Pclass, Age, Sex) %>%
  filter(!is.na(Age)) %>%
  mutate(agecat = cut(Age, breaks = c(0, 14.99, 50, 150), 
                      include.lowest = TRUE,
                      labels = c("Under 15", "15 to 50", "Over 50")))

編集が終わったら、submit() です。

> submit()

| Sourcing your script...

| Excellent work!

  |=====================================================================          |  88%
| In this question, you will continue to build on the data cleaning you from the last
| three questions. I opened a new R script for you with the previous steps completed. As
| a next step, you now need to generate a data frame with the number of passengers,
| number of survivors, and percent survival stratified by passenger class, age category,
| and sex. To do this, add one or more functions from `dplyr` or `tidyr` to the pipe
| chain in the script. Your goal is to re-create the example output shown in the
| comments of the script. When you are ready to submit your script, save the script and
| type `submit()`, or type `reset()` to reset the script to its original state.

step4_titanic.R の編集前の内容

# After the previous question, you should have transformed the `titanic`
# data to look like this:
#
##   Survived Pclass   Age     Sex      agecat
##          0      3    22    male    15 to 50
##          1      1    38  female    15 to 50
##          1      3    26  female    15 to 50
##          1      1    35  female    15 to 50
##          0      3    35    male    15 to 50
##          0      1    54    male     Over 50
#
# Add one or more `dplyr` or `tidyr` functions to the pipe chain in 
# the code at the bottom of the script to change the `titanic` 
# dataset. The first six lines of the final `titanic_4` dataset 
# should look like the following example, with the number of
# passengers, number of survivors, and percent survival stratified
# by passenger class, age category, and sex. Be sure to use the 
# same column names as shown in the example output. 
#
## Pclass   agecat    Sex      N     survivors   perc_survived
## <int>   <fctr>    <chr>   <int>     <int>         <dbl>
##   1    Under 15  female     2         1        50.000000
##   1    Under 15    male     3         3       100.000000
##   1    15 to 50  female    70        68        97.142857
##   1    15 to 50    male    72        32        44.444444
##   1    Over 50   female    13        13       100.000000
##   1    Over 50     male    26         5        19.230769
#
# I have already loaded the `titanic` data frame for you, so you 
# can explore it and test out your code in the console.
#
# When you are ready submit your answer, save the script and type 
# submit(), or type reset() to reset the script to its original 
# state. 

titanic_4 <- titanic %>% 
  select(Survived, Pclass, Age, Sex) %>%
  filter(!is.na(Age)) %>%
  mutate(agecat = cut(Age, breaks = c(0, 14.99, 50, 150), 
                      include.lowest = TRUE,
                      labels = c("Under 15", "15 to 50",
                                 "Over 50"))) %>%

最後の部分に追加して試すが、正解には至らず。skip() しても、Error となる。

> titanic_4 <- titanic %>% 
+     select(Survived, Pclass, Age, Sex) %>%
+     filter(!is.na(Age)) %>%
+     mutate(agecat = cut(Age, breaks = c(0, 14.99, 50, 150), 
+                         include.lowest = TRUE,
+                         labels = c("Under 15", "15 to 50",
+                                    "Over 50"))) %>%
+     group_by(Pclass, agecat, Sex) %>%
+     summarize(N = n(),
+               survivors = sum(Survived == 1),
+               perc_survived = 100 * survivors / N)
> head(titanic_4)
# A tibble: 6 x 6
# Groups:   Pclass, agecat [3]
  Pclass agecat   Sex        N survivors perc_survived
   <int> <fct>    <chr>  <int>     <int>         <dbl>
1      1 Under 15 female     2         1          50  
2      1 Under 15 male       3         3         100  
3      1 15 to 50 female    70        68          97.1
4      1 15 to 50 male      72        32          44.4
5      1 Over 50  female    13        13         100  
6      1 Over 50  male      26         5          19.2

> submit()

| Sourcing your script...

Error in source(e$script_temp_path, encoding = "UTF-8") : 
  /tmp/Rtmpr2U5Ox/step_4_titanic.R:43:0: unexpected end of input
41:                       labels = c("Under 15", "15 to 50",
42:                                  "Over 50"))) %>%
   ^

| Not exactly. Give it another go.

| Check the example output in the comments of the script I opened for you. Make sure the
| columns in your output are in the same order as shown in the example output in the
| script comments. Try running your code in the console and printing out the result to
| determine whether it's successfully generating the desired final data frame before you
| submit it.

warning messages from top-level task callback 'mini'
Warning messages:
1: Detecting old grouped_df format, replacing `vars` attribute by `groups` 
2: Detecting old grouped_df format, replacing `vars` attribute by `groups` 

> skip()

| I just sourced the following script, which demonstrates one possible solution.

step4_titanic-corect.R の内容

# After the previous question, you should have transformed the `titanic`
# data to look like this:
#
##   Survived Pclass   Age     Sex      agecat
##          0      3    22    male    15 to 50
##          1      1    38  female    15 to 50
##          1      3    26  female    15 to 50
##          1      1    35  female    15 to 50
##          0      3    35    male    15 to 50
##          0      1    54    male     Over 50
#
# Use a `dplyr` or `tidyr` function or functions to change the `titanic` dataset so that the 
# first six lines look like this, where you've added, for each combination of 
# `Pclass`, `agecat`, and `Sex`, the columns `N` (total number of passengers 
# in that group), `survivors` (the number of people in the group who survived),
# and `perc_survived` (the percent of people in the group who survived). 
#
## Pclass   agecat    Sex      N     survivors   perc_survived
## <int>   <fctr>    <chr>   <int>     <int>         <dbl>
##   1    Under 15  female     2         1         50.00000
##   1    Under 15    male     3         3        100.00000
##   1    15 to 50  female    68        67         98.52941
##   1    15 to 50    male    69        31         44.92754
##   1    Over 50   female    15        14         93.33333
##   1    Over 50     male    29         6         20.68966
#
# When you are ready to move on, save the script and type submit(), or type 
# reset() to reset the script to its original state.

titanic_4 <- titanic %>% 
  select(Survived, Pclass, Age, Sex) %>%
  filter(!is.na(Age)) %>%
  mutate(agecat = cut(Age, breaks = c(0, 14.99, 50, 150), 
                      include.lowest = TRUE,
                      labels = c("Under 15", "15 to 50", "Over 50"))) %>%
  group_by(Pclass, agecat, Sex) %>%
  summarize(N = n(),
            survivors = sum(Survived == 1),
            perc_survived = 100 * survivors / N)
Press Enter when you are ready to continue...

| Try again. Getting it right on the first try is boring anyway!

| Check the example output in the comments of the script I opened for you. Make sure the
| columns in your output are in the same order as shown in the example output in the
| script comments. Try running your code in the console and printing out the result to
| determine whether it's successfully generating the desired final data frame before you
| submit it.

warning messages from top-level task callback 'mini'
Warning messages:
1: Detecting old grouped_df format, replacing `vars` attribute by `groups` 
2: Detecting old grouped_df format, replacing `vars` attribute by `groups` 
> bye()

| Leaving swirl now. Type swirl() to resume.

修正できず、ここで終了。

6.4.13 Lesson 13: Text Manipulation Functions

Base R の paste, paste0, nchar, toupper, tolower など、文字列を扱う命令の基本について学びます。

> swirl()

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you
| did then. If you are new, call yourself something unique.

What shall I call you? SL

| Would you like to continue with one of these lessons?

1: R Programming Basic Building Blocks
2: The R Programming Environment Data Manipulation
3: No. Let me start something new.

Selection: 3

Selection 2 とすると、Error の箇所に行きますので、3 を選択します。

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: The R Programming Environment
3: Take me to the swirl course repository!

Selection: 2

| Please choose a lesson, or type 0 to return to course menu.

 1: Setting Up Swirl              2: Basic Building Blocks      
 3: Sequences of Numbers          4: Vectors                    
 5: Missing Values                6: Subsetting Vectors         
 7: Matrices and Data Frames      8: Logic                      
 9: Workspace and Files          10: Reading Tabular Data       
11: Looking at Data              12: Data Manipulation          
13: Text Manipulation Functions  14: Regular Expressions        
15: The stringr Package          

Selection: 13
  |                                                                               |   0%

| Most common types of data are encoded in text, even if that text is representing
| numerical values, so being able to manipulate text as a software developer is
| essential. R provides several built-in tools for manipulating text, and there is a
| rich ecosystem of packages for R for text based analysis. First let’s concentrate on
| some basic text manipulation functions.

...

  |====                                                                           |   5%
| Text in R is represented as a string object, which looks like a phrase surrounded by
| quotation marks in the R console. For example "Hello!" and 'Strings are fun!' are both
| strings. You can tell whether an object is a string using the is.character() function.
| Strings are also known as characters in R.

...

  |========                                                                       |  11%
| Assign the string "Hello World!" to a variable called x.

> x <- "Hello World!"

| That's a job well done!

  |============                                                                   |  16%
| Now look at the value of that variable by entering x into the R console.

> x
[1] "Hello World!"

| You are quite good my friend!

  |=================                                                              |  21%
| You can combine several strings using the paste() function. Provide the arguments
| "Square" "Circle" and "Triangle" to paste() to see how it works.

> paste("Square", "Circle", "Triangle")
[1] "Square Circle Triangle"

| That's the answer I was looking for.

  |=====================                                                          |  26%
| By default the paste() function inserts a space between each word. You can insert a
| different string between each word by specifying the sep argument. Paste together the
| same three words as before, but this time specify sep = "+"

> paste("Square", "Circle", "Triangle", sep = "+")
[1] "Square+Circle+Triangle"

| You are quite good my friend!

  |=========================                                                      |  32%
| Which of the following expressions evaluates to "Yes_No"

1: paste("Yes", "No", sep = "_")
2: paste(c("Yes", "No"), sep = "_")
3: paste("Yes", "No", "_")

Selection: 1

| Great job!

  |=============================                                                  |  37%
| A shortcut for combining all of the string arguments without any characters in between
| each of them is to use the paste0() function. Use paste0() to paste together "Square"
| "Circle" and "Triangle"

> paste0("Square", "Circle", "Triangle")
[1] "SquareCircleTriangle"

| Excellent job!

  |=================================                                              |  42%
| You can also provide a vector of strings as an argument to paste().

...

  |=====================================                                          |  47%
| First assign a string vector containing "Square" "Circle" and "Triangle" to a variable
| called shapes.

> shapes <- c("Square", "Circle", "Triangle")

| Perseverance, that's the answer.

  |==========================================                                     |  53%
| Now enter paste("My favorite shape is a", shapes) into the R console.

> paste("My favorite shape is a", shapes)
[1] "My favorite shape is a Square"   "My favorite shape is a Circle"  
[3] "My favorite shape is a Triangle"

| Perseverance, that's the answer.

  |==============================================                                 |  58%
| As you can see, all of the possible string combinations are produced when you provide
| a vector of strings as an argument to paste().

...

  |==================================================                             |  63%
| You can also collapse all of the elements of a vector of strings into a single string
| by specifying the collapse argument. Enter paste(shapes, collapse = " ") into the R
| console to see how this works.

> paste(shapes, collapse = " ")
[1] "Square Circle Triangle"

| You're the best!

  |======================================================                         |  68%
| Besides pasting strings together, there are a few other basic string manipulation
| functions you should be made aware of.

...

  |==========================================================                     |  74%
| The nchar() function counts the number of characters in a string. Enter nchar("Count
| Me!") into the R console to try this out.

> nchar("Count Me!")
[1] 9

| You got it right!

  |==============================================================                 |  79%
| The toupper() and tolower() functions make strings all uppercase or lowercase
| respectively. First create a string vector called cases which contains the strings
| "CAPS" "low" and "Title"

> cases <- c("CAPS", "low", "Title")

| You got it!

  |===================================================================            |  84%
| Now enter toupper(cases) into the R console.

> toupper(cases)
[1] "CAPS"  "LOW"   "TITLE"

| All that practice is paying off!

  |=======================================================================        |  89%
| Now try entering tolower(cases) into the R console.

> tolower(cases)
[1] "caps"  "low"   "title"

| That's a job well done!

  |===========================================================================    |  95%
| In this lesson we only covered the basics of manipulating strings in R, the real fun
| material is in the regular expressions lesson!

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: Generate Code
3: No

Selection: 3

| You are doing so well!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 13 終了。

6.4.14 Lesson 14: Regular Expressions

正規表現と、grepl (grep logical) について学びます。最後の Lesson 15 でも利用します。

> swirl()

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you
| did then. If you are new, call yourself something unique.

What shall I call you? SL

| Would you like to continue with one of these lessons?

1: R Programming Basic Building Blocks
2: The R Programming Environment Data Manipulation
3: No. Let me start something new.

Selection: 3

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: The R Programming Environment
3: Take me to the swirl course repository!

Selection: 2

| Please choose a lesson, or type 0 to return to course menu.

 1: Setting Up Swirl              2: Basic Building Blocks      
 3: Sequences of Numbers          4: Vectors                    
 5: Missing Values                6: Subsetting Vectors         
 7: Matrices and Data Frames      8: Logic                      
 9: Workspace and Files          10: Reading Tabular Data       
11: Looking at Data              12: Data Manipulation          
13: Text Manipulation Functions  14: Regular Expressions        
15: The stringr Package          

Selection: 14
  |                                                                               |   0%

| Now that we’ve covered the basics of string manipulation in R, let’s discuss the more
| advanced topic of regular expressions. A regular expression is a string that defines a
| pattern that could be contained within another string. A regular expression can be
| used for searching for a string, searching within a string, or replacing one part of a
| string with another string. In this lesson I might refer to a regular expression as a
| regex, just know that they’re the same thing.

...

  |==                                                                             |   2%
| Regular expressions use characters to define patterns of other characters. Although
| that approach may seem problematic at first, we’ll discuss meta-characters (characters
| that describe other characters) and how you can use them to create powerful regular
| expressions.

...

  |===                                                                            |   4%
| One of the most basic functions in R that uses regular expressions is the grepl()
| function, which takes two arguments - a regular expression and a string to be
| searched. If the string contains the specified regular expression then grepl() will
| return TRUE, otherwise it will return FALSE.

...

  |=====                                                                          |   6%
| Let’s contruct an example using grepl() with a string and a regex. First assign the
| regex "a" to a variable called regular_expression.

> regular_expression <- "a"

| You are amazing!

  |======                                                                         |   8%
| Next assign the string "Maryland" to a variable called string_to_search.

> string_to_search <- "Maryland"

| That's the answer I was looking for.

  |========                                                                       |  10%
| Now use grepl() to see whether regular_expression is present in string_to_search.

> grepl(regular_expression, string_to_search)
[1] TRUE

| Your dedication is inspiring!

  |=========                                                                      |  12%
| In the previous example we specify the regular expression "a" and store it in a
| variable called regular_expression. Remember that regular expressions are just
| strings! We also store the string "Maryland" in a variable called string_to_search.
| The regular expression "a" represents a single occurrence of the character "a". Since
| "a" is contained within "Maryland", grepl() returns the value TRUE.

...

  |===========                                                                    |  14%
| Which of the following expressions would return the value FALSE?

1: grepl("u", "Maryland")
2: grepl("a", "Maryland")
3: grepl("M", "Maryland")

Selection: 1

| Perseverance, that's the answer.

  |============                                                                   |  16%
| Let's try a few more examples. Before you enter each comming into the R console, try
| to guess what the result will be (either TRUE or FALSE). Type grepl("ryla",
| "Maryland") into the R console.

> grepl("ryla", "Maryland")
[1] TRUE

| You got it right!

  |==============                                                                 |  18%
| Now try typing grepl("Marly", "Maryland") into the R console.

> grepl("Marly", "Maryland")
[1] FALSE

| That's a job well done!

  |===============                                                                |  20%
| Since "ryla" is a sub-string of "Maryland", grepl() returns TRUE, however when a
| regular expression like "Marly" is searched grepl() returns FALSE because it is not a
| sub-string of "Maryland".

...

  |=================                                                              |  22%
| There’s a dataset that comes with R called state.name which is a vector of strings,
| one for each state in the United States of America. We’re going to use this vector in
| several of the following examples. Let's take a look at this vector by entering
| head(state.name)

> head(state.name)
[1] "Alabama"    "Alaska"     "Arizona"    "Arkansas"   "California" "Colorado"  

| Excellent work!

  |===================                                                            |  24%
| Let’s build a regular expression for identifying several strings in this vector,
| specifically a regular expression that will match names of states that both start and
| end with a vowel. The state name could start and end with any vowel, so we won’t be
| able to match exact sub-strings like in the previous examples. Thankfully we can use
| metacharacters to look for vowels and other parts of strings.

...

  |====================                                                           |  25%
| I'm opening up a table of metacharacters that I'll be refering to for the rest of the
| lesson.

...

  |======================                                                         |  27%
| The first metacharacter that we’ll discuss is ".". The metacharacter that only
| consists of a period represents any character other than a new line (we’ll discuss new
| lines soon).

...

  |=======================                                                        |  29%
| Let's try using this metacharacter. Enter grepl(".", "Maryland") into the R console to
| see if there is one instance of any character present.

> grepl(".", "Maryland")
[1] TRUE

| You're the best!

  |=========================                                                      |  31%
| Let's try one more time using this metacharacter. Enter grepl(".", "") into the R
| console.

> grepl(".", "")
[1] FALSE

| Keep up the great work!

  |==========================                                                     |  33%
| There aren't any characters in an empty string, there this expression is evaluated to
| FALSE.

...

  |============================                                                   |  35%
| As you can see the period metacharacter is very liberal. This metacharacter is most
| userful when you don’t care about a set of characters in a regular expression. Enter
| grepl("a.b", c("aaa", "aab", "abb", "acadb")) into the R console to see what happens.

> grepl("a.b", c("aaa", "aab", "abb", "acadb"))
[1] FALSE  TRUE  TRUE  TRUE

| You are really on a roll!

  |=============================                                                  |  37%
| In the previos case grepl() returns TRUE for all strings that contain an "a" followed
| by any other character followed by a "b".

...

  |===============================                                                |  39%
| You can specify a regular expression that contains a certain number of characters or
| metacharacters using the enumeration metacharacters.

...

  |=================================                                              |  41%
| The "+" metacharacter indicates that one or more of the preceding expression should be
| present and "*" indicates that zero or more of the preceding expression is present.
| Use grepl() to see if the string "Maryland" contains one for more of "a"

> greple("a", "Maryland")
Error in greple("a", "Maryland") : could not find function "greple"
> grepl("a", "Maryland")
[1] TRUE

| Not exactly. Give it another go. Or, type info() for more options.

| Just type grepl("a+", "Maryland")

> grepl("a+", "Maryland")
[1] TRUE

| You nailed it! Good job!

  |==================================                                             |  43%
| Now use grepl() to see if the string "Maryland" contains zero for more of "x"

> grepl("x*", "Maryland")
[1] TRUE

| You nailed it! Good job!

  |====================================                                           |  45%
| You can also specify exact numbers of expressions using curly brackets {}. For example
| "a{5}" specifies "a exactly five times," "a{2,5}" specifies "a between 2 and 5 times,"
| and "a{2,}" specifies "a at least 2 times."

...

  |=====================================                                          |  47%
| Use grepl() to see if the string "Mississippi" contains exactly two adjacent "s"
| (using curly brackets).

> grepl("s{2}", "Mississippi")
[1] TRUE

| Excellent job!

  |=======================================                                        |  49%
| Use grepl() to see if the string "Mississippi" contains two or three adjacent "i"
| (using curly brackets).

> grepl("i{2,3}", "Mississippi")
[1] FALSE

| You got it right!

  |========================================                                       |  51%
| In addition to curly brackets you can use parentheses "()" to create a capturing
| group. A capturing group allows you to use quantifiers on other regular expressions.
| Let's use a capturing group to see if the string "Mississippi" contains the substring
| "iss" twice.

...

  |==========================================                                     |  53%
| Type grepl("(iss){2}", "Mississippi") into the R console to see if the substring "iss"
| exists twice adjacently in the string "Mississippi"

> grepl("(iss){2}", "Mississippi")
[1] TRUE

| Keep working like that and you'll get there!

  |===========================================                                    |  55%
| Notice that the regular expression in the previous example is essentially the same as
| "ississ"

...

  |=============================================                                  |  57%
| Which of the following expressions will evaluate to TRUE? This is tough! Take a minute
| to think about it.

1: grepl("(i.{3}){3}", "Mississippi")
2: grepl("(i.{2}){4}", "Mississippi")
3: grepl("(i.{2}){3}", "Mississippi")

Selection: 3

| Excellent work!

  |==============================================                                 |  59%
| You can specify sets of characters with regular expressions, some of which come built
| in, but you can build your own character sets too. It will be a good idea to reference
| the metachacter table I brought up before.

...

  |================================================                               |  61%
| First we'll discuss the built in character sets words ("\\w"), digits ("\\d"), and
| whitespace characters ("\\s"). Words specify any letter, digit, or a underscore,
| digits specify the digits 0 through 9, and whitespace specifies line breaks, tabs, or
| spaces. Each of these character sets have their own compliments - not words ("\\W"),
| not digits ("\\D"), and not whitespace characters ("\\S"). Each specifies all of the
| characters not included in their corresponding character sets.

...

  |==================================================                             |  63%
| Let's try out a few of these character sets. Type grepl("\\d", "0123456789") into the
| R console to see if the string "0123456789" contains a digit.

> grepl("\\d", "0123456789")
[1] TRUE

| You are amazing!

  |===================================================                            |  65%
| Now let's try using the compliment of the "\\d" set. Enter grepl("\\D", "0123456789")
| into the R console to see if there are any non-digits in the string "0123456789"

> grepl("\\D", "0123456789")
[1] FALSE

| All that practice is paying off!

  |=====================================================                          |  67%
| Which of the following evaluates to FALSE?

1: grepl("\\s", "abcdefghijklmnopqrstuvwxyz0123456789")
2: grepl("\\d", "abcdefghijklmnopqrstuvwxyz0123456789")
3: grepl("\\w", "abcdefghijklmnopqrstuvwxyz0123456789")

Selection: 1

| You are really on a roll!

  |======================================================                         |  69%
| You can also specify specific character sets using straight brackets []. For example a
| character set of just the vowels would look like - "[aeiou]". You can find the
| complement to a specific character by putting a carrot ^ after the first bracket. For
| example "[^aeiou]" matches all characters except the lowercase vowels. You can also
| specify ranges of characters using a hyphen - inside of the brackets. For example
| "[a-m]" matches all of the lowercase characters between a and m, while "[5-8]" matches
| any digit between 5 and 8 inclusive.

...

  |========================================================                       |  71%
| Time to take some of the training wheels off! Use grepl() to see if there is a vowel
| in the string "rhythms"

> grepl("[aeiou]", "rhythms")
[1] FALSE

| Keep up the great work!

  |=========================================================                      |  73%
| Which of the following expressions evaluates to TRUE?

1: grepl("[aeiou]", "rhythms")
2: grepl("[a-mA-M]", "ABC")
3: grepl("[a-m]", "ABC")

Selection: 2

| Great job!

  |===========================================================                    |  75%
| You might be wondering how you can use regular expressions to match a particular
| punctuation mark since many punctuation marks are used as metacharacters! Putting two
| backslashes before a punctuation mark that is also a metacharacter indicates that you
| are looking for the symbol and not the metacharacter meaning. For example "\\."
| indicates you are trying to match a period in a string.

...

  |============================================================                   |  76%
| Use grepl() to see whether or not a peroid exists in the string
| "http://www.jhsph.edu/"

> grepl("\\.", "http://www.jhsph.edu/")
[1] TRUE

| Your dedication is inspiring!

  |==============================================================                 |  78%
| Which of the following expressions evaluates to TRUE?

1: grepl("\\+", "tragedy + time = humor")
2: grepl("\\*", "tragedy + time = humor")
3: grepl("\\.", "tragedy + time = humor")

Selection: 1

| Nice work!

  |================================================================               |  80%
| There are also metacharacters for matching the beginning and the end of a string which
| are "^" and "$" respectively.

...

  |=================================================================              |  82%
| Let's test these operators out. Enter grepl("^a", c("bab", "aab")) into the R console
| in order to see if either string begins with "a"

> grepl("^a", c("bab", "aab"))
[1] FALSE  TRUE

| You're the best!

  |===================================================================            |  84%
| Now use grepl() to see if the two strings from the last question end with the letter
| "b"

> grepl("$b", c("bab", "aab"))
[1] FALSE FALSE

| Not quite right, but keep trying. Or, type info() for more options.

| Take a look at the anser to the last question, and remember to use the "$"
| metacharacter which represeents the end of a string!

> grepl("b$", c("bab", "aab"))
[1] TRUE TRUE

| You are quite good my friend!

  |====================================================================           |  86%
| The last metacharacter we’ll discuss is the OR metacharacter ("|"). The OR
| metacharacter matches either the regex on the left or the regex on the right side of
| this character.

...

  |======================================================================         |  88%
| Enter grepl("a|b", c("abc", "bcd", "cde")) into the R console to see which strings
| contain "a" or "b"

> grepl("a|b", c("abc", "bcd", "cde"))
[1]  TRUE  TRUE FALSE

| Perseverance, that's the answer.

  |=======================================================================        |  90%
| The expression grepl("North|South", c("South Dakota", "North Carolina", "West
| Virginia")) evaluates to which of the following?

1: TRUE TRUE FALSE
2: FALSE TRUE FALSE
3: TRUE FALSE FALSE

Selection: 1

| You're the best!

  |=========================================================================      |  92%
| Finally we’ve learned enough to create a regular expression that matches all state
| names that both begin and end with a vowel! This regular expression must match the
| beginning of a string, then one instance of a capalized vowel, then any characters
| until one instance of a lowercase vowel followed by the end of the string.

...

  |==========================================================================     |  94%
| Constructing this regular expression may be a little tough so I'll help you out! Enter
| start_end_vowel <- "^[AEIOU]{1}.+[aeiou]{1}$" into the R console to store this regex.

> start_end_vowel <- "^[AEIOU]{1}.+[aeiou]{1}$"

| Keep working like that and you'll get there!

  |============================================================================   |  96%
| Now you can see which states match this regular expression. Enter vowel_state_lgl <-
| grepl(start_end_vowel, state.name) into the R console to store a logical vector which
| indicates which strings in state.name matches the regular expression.

> vowel_state_lgl <- grepl(start_end_vowel, state.name)

| Excellent work!

  |=============================================================================  |  98%
| Now finally index state.name with the vector you created in the last question in order
| to see which state names start and end with a vowel!

> state.name[vowel_state_lgl]
[1] "Alabama"  "Alaska"   "Arizona"  "Idaho"    "Indiana"  "Iowa"     "Ohio"    
[8] "Oklahoma"

| You are quite good my friend!

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Generate Code
2: Yes
3: No

Selection: 3

| Excellent job!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 14 終了。

6.4.15 Lesson 15: The stringr Package

stringr Package について学びます。一群の、tidyverse Package の一つです。

  • Tidyverse: R packages for data science
  • stringr Package: stringr provides a cohesive set of functions designed to make working with strings as easy as possible. It is built on top of stringi, which uses the ICU C library to provide fast, correct implementations of common string manipulations.
  • このパッケージを利用するためには、library(stringr) または library(tidyverse) とします。
| Would you like to continue with one of these lessons?

1: R Programming Basic Building Blocks
2: The R Programming Environment Data Manipulation
3: No. Let me start something new.

Selection: 3

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: The R Programming Environment
3: Take me to the swirl course repository!

Selection: 2

| Please choose a lesson, or type 0 to return to course menu.

 1: Setting Up Swirl              2: Basic Building Blocks      
 3: Sequences of Numbers          4: Vectors                    
 5: Missing Values                6: Subsetting Vectors         
 7: Matrices and Data Frames      8: Logic                      
 9: Workspace and Files          10: Reading Tabular Data       
11: Looking at Data              12: Data Manipulation          
13: Text Manipulation Functions  14: Regular Expressions        
15: The stringr Package          

Selection: 15

| Attempting to load lesson dependencies...

| Package ‘stringr’ loaded correctly!

  |                                                                               |   0%

| The stringr package is wonderful for working with strings in R. Most of the functions
| in stringr take the same two arguments, a string and then a regex.

...

  |====                                                                           |   5%
| However before we get into stringr functions let's cover some of the fundamental
| string functions in R that use regular expressions.

...

stringr について、学ぶ前に、Base R で、正規表現を用いて、文字列を編集するほうほうについて学びます。

  |========                                                                       |  10%
| In previous lessons we’ve been using grepl() to see if a regex matches a string.
| grepl() stands for "grep logical." Try out grepl() below by entering grepl("[Ii]",
| c("Hawaii", "Illinois", "Kentucky")) into the R console.

> grepl("[Ii]", c("Hawaii", "Illinois", "Kentucky"))
[1]  TRUE  TRUE FALSE

| You're the best!

  |============                                                                   |  15%
| There’s also the old fashioned grep() which returns the indices of the vector that
| match the regex argument. Enter grep("[Ii]", c("Hawaii", "Illinois", "Kentucky")) into
| the R console to test it out.

> grep("[Ii]", c("Hawaii", "Illinois", "Kentucky"))
[1] 1 2

| You nailed it! Good job!

  |================                                                               |  20%
| What would the expression grep("[Aa]", c("Florida", "Alabama", "Idaho")) be evaluated
| to?

1: 2 3
2: 1 2
3: 1 3
4: 1 2 3

Selection: 4

| Great job!

  |====================                                                           |  25%
| The sub() function takes as arguments a regex, a “replacement,” and a vector of
| strings. This function will replace the first instance of that regex found in each
| string. Try out sub() by entering sub("[Ii]", "1", c("Hawaii", "Illinois",
| "Kentucky")) into the R console.

> sub("[Ii]", "1", c("Hawaii", "Illinois", "Kentucky"))
[1] "Hawa1i"   "1llinois" "Kentucky"

| You are amazing!

  |========================                                                       |  30%
| The gsub() function is nearly the same as sub() except it will replace every instance
| of the regex that is matched in each string. Enter the exact line of code you used in
| the previous question, except use gsub() instead of sub().

> gsub("[Ii]", "1", c("Hawaii", "Illinois", "Kentucky"))
[1] "Hawa11"   "1ll1no1s" "Kentucky"

| You got it right!

  |============================                                                   |  35%
| The expression gsub("[Aao]", "0", c("Georgia", "Minnesota", "California")) once
| evaluated will result in which of the following

1: Ge0rgia Minnes0ta Calif0rnia
2: Ge0rgi0 Minnes0t0 C0lif0rni0
3: Georgi0 Minnesot0 C0liforni0

Selection: 2

| That's the answer I was looking for.

  |================================                                               |  40%
| The last functions we'll talk about before we get into stringr's functions is
| strsplit().

...

  |====================================                                           |  45%
| Store the names of all of the US states that contain two adjacent s's by entering
| two_s <- state.name[grep("ss", state.name)] into the R console.

> two_s <- state.name[grep("ss", state.name)]

| You got it!

  |========================================                                       |  50%
| Now take a look at the value of this variable by entering two_s into the R console.

> two_s
[1] "Massachusetts" "Mississippi"   "Missouri"      "Tennessee"    

| Excellent work!

  |===========================================                                    |  55%
| In order to split each string where the s's are located enter strsplit(two_s, "ss")
| into the R console.

> strsplit(two_s, "ss")
[[1]]
[1] "Ma"        "achusetts"

[[2]]
[1] "Mi"   "i"    "ippi"

[[3]]
[1] "Mi"   "ouri"

[[4]]
[1] "Tenne" "ee"   


| That's correct!

いよいよ、stringr について学びます。

  |===============================================                                |  60%
| Finally onto the stringr package! This package takes a “data first” approach to
| functions involving regex, so usually the string is the first argument and the regex
| is the second argument. The majority of the function names in stringr begin with str_.

...

  |===================================================                            |  65%
| The str_extract() function returns the sub-string of a string that matches the
| providied regular expression. Enter str_extract("Camaro Z28", "[0-9]+") into the R
| console to give this function a spin.

> str_extract("Camaro Z28", "[0-9]+")
[1] "28"

| Your dedication is inspiring!

  |=======================================================                        |  70%
| The str_order() function returns a numeric vector that corresponds to the alphabetical
| order of the strings in the provided vector. Enter str_order(c("p", "e", "n", "g"))
| into the R console.

> str_order(c("p", "e", "n", "g"))
[1] 2 4 3 1

| Excellent work!

  |===========================================================                    |  75%
| The str_pad() function pads strings with other characters which is often useful when
| the string is going to be eventually printed for a person to read. Try this out by
| entering str_pad("Thai", width = 8, side = "left", pad = "-") into the R console.

> str_pad("Thai", width = 8, side = "left", pad = "-")
[1] "----Thai"

| You got it!

  |===============================================================                |  80%
| The str_to_title() function acts just like tolower() and toupper() except it puts
| strings into Title Case. Enter str_to_title(c("CAPS", "low", "Title")) into the R
| console to see how this works.

> str_to_title(c("CAPS", "low", "Title"))
[1] "Caps"  "Low"   "Title"

| Excellent work!

  |===================================================================            |  85%
| The str_trim() function deletes whitespace from both sides of a string. Enter
| str_trim(" trim me ") into the R console.

> str_trim(" trim me ")
[1] "trim me"

| You're the best!

  |=======================================================================        |  90%
| Lastly the word() function allows you to index each word in a string as if it were a
| vector. Enter word("See Spot run.", 2) into the R console to test this out.

> word("See Spot run.", 2)
[1] "Spot"

| You are doing so well!

  |===========================================================================    |  95%
| That's the end of our tour of stringr!

...

  |===============================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Generate Code
2: Yes
3: No

Selection: 3

| Keep up the great work!

| You've reached the end of this lesson! Returning to the main menu...

Lesson 15 終了。

6.2 Getting and Cleaning Data

Inermediate dplyr, tidyr, lubridate, oh my!

6.3 Advanced R Programming

Advanced

6.4 A (very) short introduction to R

6.4.1 この Swirl について

Claudia Brauer (Wageningen University) が同僚の Paul Torfs と協力して作成した、swirl です。大学で、Windows User を、主として対象として、説明がなされていますが、むろん、Mac でも、大学生でなくても、使えます。ただし、大学入学年や、誕生日を入力したり、一カ所、毎年12月5日にプレゼントを送る習慣のある、オランダのお祭りSinterklaasの言及があります。

特徴の一つは、CRAN にある同名の2014年の文書の実践編として、作られていることです。もう一つは、下の、Github のリンクをみるとわかりますが、全部で、8つの Module に別れており、そのうちの、Module 1 から Module 3 が swirl に収録されていますが、それ以後も、Self-Study Module として公開されています。Module 4. Basic plotting, 5. Pretty plotting, 6. Reading data files, 7. Matrix operations, 8. Spatial data. Massive Open Online Course の一つの形式になっています。

2018年ごろにできたばかりとしてあり、たしかに、要求された答えと異なるときの、挙動に多少問題があるように、思われるが、完璧を期さず、そのときは、skip() として、答えをみるのも良いだろう。わたしも、二度、skip() を使うこととなった。一カ所は、matrix の箇所、もう一つは、最後の問題。後者は、わたしの、単純な見落としである。

  • Github: ClaudiaBrauer/A-very-short-introduction-to-R, 2018年版の PDF や、Swirl がはじめての人のための紹介もあります。
  • Script にもあるように、swirl::install_course("A_(very)_short_introduction_to_R") によって、Install して使います。

以下、つまづきそうな箇所のみコメントを加えます。

6.4.1 Install から Module の選択まで

最初には、swirl を使うよという宣言 library(swirl) です。swirl が読み込まれます。

> library(swirl)

| Hi! I see that you have some variables saved in your workspace. To keep things running
| smoothly, I recommend you clean up before starting swirl.

| Type ls() to see a list of the variables in your workspace. Then, type rm(list=ls()) to
| clear your workspace.

| Type swirl() when you are ready to begin.

次の、コマンドで、この swirl を Install します。

> swirl::install_course("A_(very)_short_introduction_to_R")
  |=================================================================================| 100%

| Course installed successfully!

swirl のスタートは、常に、この命令 swirl() です。

> swirl()

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you
| did then. If you are new, call yourself something unique.

What shall I call you? SL

コースを選択します。1 がこのコース、2 は、最初に紹介している、swirl の基本なコースです。

| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Module 1
2: Module 2
3: Module 3

Selection: 1

6.4.2 Module 1

  |                                                                                 |   0%

| This is the first module of the course `A (very) short introduction to R`. This course
| is based on a non-interactive tutorial with the same name, which can be downloaded from
| www.github.com/ClaudiaBrauer/A-very-short-introduction-to-R. The contents are the same
| (with a few exceptions), so you can open the pdf version alongside to look up how to do
| something you learned before or browse through the references on the last two pages.

...

このマーク … が出たら、そこまで読んで、Enter でした。

  |==                                                                               |   2%
| This swirl course is brand new, so it may still contain errors. If you find any, let me
| know on the issues page of the github-website.

...

まだ、できたてなので、エラーがあるかもしれないと断ってあります。 しばらくは、pdf になっている、文書にそった、説明がなされます。RStudio の説明もなされており、pdf で読むより、実際に確認できるので、よいでしょう。


  |===                                                                              |   4%
| Now, let's get started.

...

  |=====                                                                            |   6%
| The RStudio interface consists of several windows. You can change the size of the
| windows by dragging the grey bars between the windows.

...

  |=======                                                                          |   8%
| Bottom left is the console window (also called command window/line). Here you can type
| commands after the > prompt and R will then execute your command. This is the most
| important window, because this is where R actually does stuff.

...

  |========                                                                         |  10%
| Top left is the editor window (also called script window). Here collections of commands
| (scripts) can be edited and saved. When you do not get this window, you can open it with
| 'File' > 'New' > 'R script'. Just typing a command in the editor window is not enough,
| it has to get into the command window before R executes the command. If you want to run
| a line from the script window (or the whole script), you can click 'Run' or press
| 'CTRL+ENTER' to send it to the command window.

...

  |==========                                                                       |  12%
| Top right is the workspace / history window. In the workspace window you can see which
| data and values R has in its memory. You can view and edit the values by clicking on
| them. The history window shows what has been typed before.

...

  |============                                                                     |  15%
| Bottom right is the files / plots / packages / help / viewer window. Here you can open
| files, view plots (also previous plots), install and load packages or use the help
| function.

...

  |==============                                                                   |  17%
| Your working directory is the folder on your computer in which you are currently
| working. When you ask R to open a certain file, it will look in the working directory
| for this file, and when you tell R to save a data file or figure, it will save it in the
| working directory.

...

  |===============                                                                  |  19%
| Before you start working, you should set your working directory to where all your data
| and script files are or should be stored. Within RStudio you can go to 'Session' > 'Set
| working directory' > 'Choose directory'. Please do this now.

...

  |=================                                                                |  21%
| Instead of clicking, you can also type in the command window setwd("directoryname"). For
| example setwd("M:/Hydrology/R/") . Make sure that the slashes are forward slashes and
| that you do not forget the apostrophes (for the reason of the apostrophes, see Module 3.
| R is case sensitive, so make sure you write capitals where necessary. You don't have to
| try this now.

...

R Programming のコースの、第2課(2.2 Workspace and Files)で説明してあります。getwd() とすると、現在の作業ディレクトリがわかります。setwd() で指定します。説明にあるように、RStudio では、メニューの、‘Session’ > ‘Set’ > working directory’ > ‘Choose directory’ でも設定できます。そこに新しいファイルを作ります。あとで、それを、swirl という名前せよ、となりますから、最初から、swirl というディレクトリーをつくって、それを、Working Directory にしておくとよいでしょう。

  |===================                                                              |  23%
| R can do many statistical and data analyses. They are organized in so-called packages or
| libraries. With the standard installation, most common packages are installed.

...

  |====================                                                             |  25%
| To get a list of all installed packages, go to the packages window. If the box in front
| of the package name is ticked, the package is loaded (activated) and can be used. Look
| up the swirl package in the packages window and see if the box is checked.

...

  |======================                                                           |  27%
| There are many more packages available on the R website. If you want to install and use
| a package (for example, the package called 'geometry') you should first install the
| package by clicking 'install packages' in the packages window and typing geometry or by
| typing install.packages("geometry") in the command window. You actually already did this
| for the swirl package.

...

  |========================                                                         |  29%
| Next, you should load the package by checking box in front of geometry or typing
| library(geometry) in the command window. You also did this for the swirl package
| (otherwise you couldn't have been doing these exercises).

...

  |=========================                                                        |  31%
| Now, let's get started with the real programming.

...

ここから実際の演習が始まります。最初は、10^2 + 36 です。10^2 は 10の二乗、\(10^2 = 10\times 10\) の事です。

  |===========================                                                      |  33%
| R can be used as a calculator. You can just type your equation in the command window
| after the >. Type 10^2 + 36.
10^2 + 36
## [1] 136
| You are quite good my friend!

  |=============================                                                    |  35%
| And R gave the answer directly. By the way, spaces do not matter.

...

  |==============================                                                   |  38%
| Compute the difference between 2019 and the year you started at this university and
| divide this by the difference between 2019 and the year you were born. Multiply this
| with 100 to get the percentage of your life you have spent at this university. Use
| brackets if you need them.

すこし面倒なことを言っていますが、なにか入力してください。2019年が想定されています。 ```

(2019-2017)/(2019-1999)
## [1] 0.1
| All that practice is paying off!

  |================================                                                 |  40%
| If you use brackets and forget to add the closing bracket, the > on the command line
| changes into a +. The + can also mean that R is still busy with some heavy computation.
| If you want R to quit what it was doing and give back the >, press ESC (see the
| reference list on the last page of `A (very) short introduction to R`).

...

  |==================================                                               |  42%
| You can also give numbers a name. By doing so, they become so-called variables which can
| be used later. For example, you can type in the command window A = 4 .
A = 4
| You got it!

  |===================================                                              |  44%
| You can see that A appeared in the workspace window in the top right corner, which means
| that R now remembers what A is.

...

ここでは、workspace window とありますが、RStudio の右上の pane の Environment Tab を選択するとそのとき保存されている変数などが出てきます。これを時々確認するのは有効です。

  |=====================================                                            |  46%
| Some people prefer te use <- instead of =. They do the same thing. <- consists of two
| characters, namely < and -, and represents an arrow pointing at the object receiving the
| value of the expression.

...

  |=======================================                                          |  48%
| You can also ask R what A is. Just type A in the command window.
A
## [1] 4
| You nailed it! Good job!

  |========================================                                         |  50%
| You can also do calculations with A. Type A * 5 .
A * 5
## [1] 20
| You are amazing!

  |==========================================                                       |  52%
| If you specify A again, it will forget what value it had before. You can also assign a
| new value to A using the old one. Type A = A + 10 .
A = A + 10
| Excellent work!

  |============================================                                     |  54%
| You can see that the value in the workspace window changed.

...

  |==============================================                                   |  56%
| To remove all variables from R's memory, type rm(list=ls()) .

> rm(list=ls())

| You nailed it! Good job!

  |===============================================                                  |  58%
| You see that the workspace window is now empty. You can also click the broom icon
| (`clear all`) in the workspace window. You can see that RStudio then empties the
| workspace window. If you only want to remove the variable A, you can type rm(A).

...

  |=================================================                                |  60%
| Like in many other programs, R organizes numbers in scalars (a single number,
| 0-dimensional), vectors (a row of numbers, also called arrays, 1-dimensional) and
| matrices (like a table, 2-dimensional).

...

  |===================================================                              |  62%
| The A you defined before was a scalar. To define a vector with the numbers 3, 4 and 5,
| you need the function c, which is short for concatenate (paste together). Type (or
| copy-paste) B=c(3,4,5) .
B=c(3,4,5)
| You are quite good my friend!

  |====================================================                             |  65%
| Matrices and other 2-dimensional structures will be introduced later.

...

  |======================================================                           |  67%
| If you would like to compute the mean of all the elements in the vector B from the
| example above, you could type (3+4+5)/3. Try this
(3+4+5)/3
## [1] 4
| Nice work!

  |========================================================                         |  69%
| But when the vector is very long, this is very boring and time-consuming work. This is
| why things you do often are automated in so-called functions. Some functions are
| standard in R or in one of the packages. You can also program your own functions (you'll
| do that later).

...

  |=========================================================                        |  71%
| When you use a function to compute a mean, type mean(x=B)
mean(x=B)
## [1] 4
| Great job!

  |===========================================================                      |  73%
| Within the brackets you specify the arguments. Arguments give extra information to the
| function. In this case, the argument x says of which set of numbers (vector) the mean
| should computed (namely of B). Sometimes, the name of the argument is not necessary;
| mean(B) works as well. Try it.
mean(B)
## [1] 4
| You are quite good my friend!

  |=============================================================                    |  75%
| Compute the sum of 4, 5, 8 and 11 by first combining them into a vector and then using
| the function sum. Use the function c inside the function sum.
sum(c(4,5,8))
## [1] 17
| That's not the answer I was looking for, but try again. Or, type info() for more
| options.

| Type sum(c(4,5,8,11))
sum(c(4,5,8,11))
## [1] 28
| Excellent work!

  |==============================================================                   |  77%
| The function rnorm, as another example, is a standard R function which creates random
| samples from a normal distribution. Type rnorm(10) and you will see 10 random numbers
rnorm(10)
##  [1]  1.506295298 -0.277327204  0.343297545  1.003989164 -1.206756533
##  [6]  0.465691141 -1.173310437  0.461724748  1.315150119 -0.006278288

標準正規分布とよばれる、平均 mean = 0、標準偏差 sd = 1 の、正規分布の数を 10個生成します。毎回異なる数を生成しますから、出力が異なっていても、問題ありません。

| You are doing so well!

  |================================================================                 |  79%
| Here rnorm is the function and the 10 is an argument specifying how many random numbers
| you want - in this case 10 numbers (typing n=10 instead of just 10 would also work). The
| result is 10 random numbers organised in a vector with length 10.

...

  |==================================================================               |  81%
| Entering the same command again produces 10 new random numbers. Instead of typing the
| same text again, you can also press the upward arrow key (uparrow) to access previous
| commands. Try this.
rnorm(10)
##  [1]  0.11930935 -0.70423397 -1.15231450  0.03073386 -1.44955442  0.81144918
##  [7]  0.40975162  0.05326350 -0.54524261  1.89569988
| Nice work!

  |====================================================================             |  83%
| If you want 10 random numbers out of normal distribution with mean 1.2 and standard
| deviation 3.4 you can type rnorm(10, mean=1.2, sd=3.4). Try this.
rnorm(10, mean=1.2, sd=3.4)
##  [1]  2.7926525  0.2248958 -0.9185951  2.9623644 -1.3636092  5.3669220
##  [7]  6.8450458  1.6874460  2.2357418 -1.9569693
| You're the best!

  |=====================================================================            |  85%
| This shows that the same function (rnorm) may have different interfaces and that R has
| so called named arguments (in this case mean and sd).

...

  |=======================================================================          |  88%
| Comparing this example to the previous one also shows that for the function rnorm only
| the first argument (the number 10) is compulsory, and that R gives default values to the
| other so-called optional arguments. Use the help function to see which values are used
| as default by typing ?rnorm.
> ?rnorm
| Excellent work!

  |=========================================================================        |  90%
| You see the help page for this function in the help window on the right. RStudio has a
| nice feature. When you type rnorm( in the command window and press TAB, RStudio will
| show the possible arguments.

...

  |==========================================================================       |  92%
| You can also store the output of the function in a variable. Type x=rnorm(100).
x = rnorm(100)
| Excellent job!

  |============================================================================     |  94%
| Now 100 random numbers are assigned to the variable x, which becomes a vector by this
| operation.You can see it appear in the workspace window.

...

  |==============================================================================   |  96%
| R can also make graphs. Type plot(x) for a very simple example.
plot(x)

| Keep up the great work!

  |===============================================================================  |  98%
| The 100 random numbers are now plotted in the plots window on the right.

...

  |=================================================================================| 100%
| You now learned some basic R commands. We will look into more detail in the next two
| modules of `A (very) short introduction to R`.

...

| You've reached the end of this lesson! Returning to the main menu...

Module 1 は終了です。

6.4.3 Module 2

つづけて、Module 2 をはじめます。

| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Module 1
2: Module 2
3: Module 3

Selection: 2
  |                                                                                 |   0%

| Welcome to the second module of 'A (very) short introduction to R'.

...

  |==                                                                               |   2%
| Before you start working, you should set your working directory to where all your data
| and script files are or should be stored. Within RStudio you can go to 'Session'> 'Set
| working directory'. Please do this now.
> setwd("省略")
| You are really on a roll!

  |====                                                                             |   5%
| R is an interpreter that uses a command line based environment. This means that you have
| to type commands, rather than use the mouse and menus. This has the advantage that you
| do not always have to retype all commands and are less likely to get complaints of arms,
| neck and shoulders.

省略

  |======                                                                           |   7%
| You can store your commands in files, the so-called scripts. These scripts have
| typically file names with the extension .R, e.g. foo.R.

...

  |========                                                                         |   9%
| You can open an editor window to edit these files by clicking 'File' and 'New'. Try
| this. Under 'File' you also find the options 'Open file...', 'Save' and 'Save as'.

...

  |=========                                                                        |  12%
| You can run (send to the console window) part of the code by selecting lines and
| pressing CTRL+ENTER or click 'Run' in the editor window. If you do not select anything,
| R will run the line your cursor is on.

...

  |===========                                                                      |  14%
| You can always run the whole script with the console command source, so e.g. for the
| script in the file foo.R you type source('foo.R'). You can also click 'Run all' in the
| editor window or type CTRL+SHIFT+S to run the whole script at once.

...

  |=============                                                                    |  16%
| Make a script called firstscript.R. Type in the script R-code that generates 100 random
| numbers and plots them. Save the script in your working directory (in the folder called
| 'swirl') and type source("firstscript.R") on the command line (here).

最初から、working directory を swirl にしておくと簡単です。

“firstscript.R” の内容。

x = rnorm(100)
plot(x)
> source("firstscript.R")

| Excellent work!

  |===============                                                                  |  19%
| Run your script again. The plot will change because new numbers are generated.

> source("firstscript.R")

| All that practice is paying off!

  |=================                                                                |  21%
| Vectors were already introduced, but they can do more. Make a vector with numbers 1, 4,
| 6, 8, 10 and call it vec1.
vec1 = c(1,4,6,8,10)
| Perseverance, that's the answer.

  |===================                                                              |  23%
| Elements in vectors can be addressed by standard [i] indexing. Select the 5th element of
| this vector by typing vec1[5].
vec1[5]
## [1] 10
| You are amazing!

  |=====================                                                            |  26%
| Replace the 3rd element with a new number by typing vec1[3]=12.
vec1[3]=12
| You got it right!

  |=======================                                                          |  28%
| Ask R what the new version is of vec1.
vec1
## [1]  1  4 12  8 10
| That's correct!

  |========================                                                         |  30%
| You can also see the numbers of vec1 in the workspace window. Make a new vector vec2
| using the seq() (sequence) function by typing seq(from=0, to=1, by=0.25) and check its
| values in the workspace window.
vec2 = seq(from=0, to=1, by=0.25)
| You are doing so well!

  |==========================                                                       |  33%
| Type sum(vec1).
sum(vec1)
## [1] 35
| Excellent job!

  |============================                                                     |  35%
| The function sum sums up the elements within a vector, leading to one number (a scalar).
| Now use + to add the two vectors.
vec1 + vec2
## [1]  1.00  4.25 12.50  8.75 11.00
| Excellent job!

  |==============================                                                   |  37%
| If you add two vectors of the same length, the first elements of both vectors are
| summed, and the second elements, etc., leading to a new vector of length 5 (just like in
| regular vector calculus).

...

  |================================                                                 |  40%
| Matrices are nothing more than 2-dimensional vectors. To define a matrix, use the
| function matrix. Make a matrix with matrix(data=c(9,2,3,4,5,6),ncol=3) and call it mat.
mat = matrix(data=c(9,2,3,4,5,6),ncol=3)
| Keep working like that and you'll get there!

  |==================================                                               |  42%
| The argument data specifies which numbers should be in the matrix. Use either ncol to
| specify the number of columns or nrow to specify the number of rows.

...

  |====================================                                             |  44%
| Put the numbers 31 to 60 in a vector named P. Use the function seq.
P = seq(from = 31, to = 60, by = 1)
| You got it right!

  |======================================                                           |  47%
| Put the numbers 31 to 60 in a matrix named Q with 6 rows and 5 columns.

> Q = matrix(P, nrow = 6, ncol = 5)

| Not quite, but you're learning! Try again. Or, type info() for more options.

| Type ?matrix for information on the function matrix.

ここで、つまづきました。

Q = matrix(data = P, nrow = 6, ncol = 5)
Q = matrix(data = P, ncol = 5)
Q = matrix(data = seq(from = 31, to = 60, by = 1), nrow = 6, ncol = 5)
Q = matrix(data = seq(from = 31, to = 60, by = 1), ncol = 5)

? matrix などで、少し調べ、info() を実行すると

> info()

| When you are at the R prompt (>):
| -- Typing skip() allows you to skip the current question.
| -- Typing play() lets you experiment with R on your own; swirl will ignore what you
| do...
| -- UNTIL you type nxt() which will regain swirl's attention.
| -- Typing bye() causes swirl to exit. Your progress will be saved.
| -- Typing main() returns you to swirl's main menu.
| -- Typing info() displays these options again.

> skip()

| Entering the following correct answer for you...
Q=matrix(seq(31,60),nrow=6)
| Great job!

結局、これだったようですね。他も、Q は正しく構成されていますが、まったく同じではないと、正解とはなりません。

答えからみると、おそらく正しい問題は、

Put the numbers 31 to 60 in a matrix named Q with 6 rows.
  |========================================                                         |  49%
| Ask R what Q looks like.
Q
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   31   37   43   49   55
## [2,]   32   38   44   50   56
## [3,]   33   39   45   51   57
## [4,]   34   40   46   52   58
## [5,]   35   41   47   53   59
## [6,]   36   42   48   54   60
| Nice work!

  |=========================================                                        |  51%
| You can also click on Q in the workspace window. Look at the different ways scalars,
| vectors and matrices are denoted in the workspace window. Matrix operations are similar
| to vector operations. Elements of a matrix can be addressed in the usual way as
| [row,column]. Select the element on the first row and in the second column of the matrix
| called mat.
mat[1,2]
## [1] 3
| Excellent work!

  |===========================================                                      |  53%
| When you want to select a whole row, you leave the spot for the column number empty (the
| other way around for columns of course). Select the second row.
mat[2,]
## [1] 2 4 6
| You got it!

  |=============================================                                    |  56%
| Many functions also work with matrices as argument. Compute the mean of mat.
mean(mat)
## [1] 4.833333
| Your dedication is inspiring!

  |===============================================                                  |  58%
| The third type of data structure treated here is the data frame. Time series are often
| ordered in data frames. A data frame is a matrix with names above the columns. This is
| nice, because you can call and use one of the columns without knowing in which position
| it is. Make a data frame with t = data.frame(x = c(11,12,14), y = c(19,20,21), z =
| c(10,9,7)).
t = data.frame(x = c(11,12,14), y = c(19,20,21), z = c(10,9,7))
| You are doing so well!

  |=================================================                                |  60%
| Ask R what t is.
t
| Great job!

  |===================================================                              |  63%
| The data frame is called t and the columns have the names x, y and z. You can select one
| column by typing t$z. Try this.
t$z
## [1] 10  9  7
| You are doing so well!

  |=====================================================                            |  65%
| Another option is to type t[["z"]]. Try this as well.
t[["z"]]
## [1] 10  9  7
| Excellent job!

  |=======================================================                          |  67%
| Compute the mean of column z in data frame t.
mean(t$z)
## [1] 8.666667
| Keep working like that and you'll get there!

  |=========================================================                        |  70%
| In the following question you will be asked to modify a script that will appear as soon
| as you move on from this question. When you have finished modifying the script, save
| your changes to the script and type submit() and the script will be evaluated. There
| will be some comments in the script that opens up. Be sure to read them!

...

  |==========================================================                       |  72%
| Make a script file which constructs three random normal vectors of length 100. Call
| these vectors x1, x2 and x3. Make a data frame called t with three columns (called a, b
| and c) containing respectively x1, x1+x2 and x1+x2+x3. Call plot(t) for this data frame.

編集ができたら、最後は、submit() です。

dataframe.R の編集後の内容

# Text behind the #-sign is not evaluated as code by R. 
# This is useful, because it allows you to add comments explaining what the script does.

# In this script, replace the ... with the appropriate commands.

x1 = rnorm(100)
x2 = rnorm(100)
x3 = rnorm(100)
t = data.frame(a = x1, b = x1 + x2, c = x1 + x2 + x3)
plot(t)
> submit()

| Sourcing your script...

実行結果

| You got it!

  |============================================================                     |  74%
| Can you understand the results?

...

  |==============================================================                   |  77%
| Another basic structure in R is a list. The main advantage of lists is that the
| `columns` (they are not really ordered in columns any more, but are more a collection of
| vectors) don't have to be of the same length, unlike matrices and data frames. Make this
| list L = list(one=1, two=c(1,2), five=seq(0, 1, length=5)).
L = list(one=1, two=c(1,2), five=seq(0, 1, length=5))
| You are really on a roll!

  |================================================================                 |  79%
| The list L has names and values. You can type L to see the contents.
L
## $one
## [1] 1
## 
## $two
## [1] 1 2
## 
## $five
## [1] 0.00 0.25 0.50 0.75 1.00
| Great job!

  |==================================================================               |  81%
| L also appeared in the workspace window. To find out what's in the list, type names(L).
names(L)
## [1] "one"  "two"  "five"
| Keep working like that and you'll get there!

  |====================================================================             |  84%
| Add 10 to the column called five.
L$five + 10
## [1] 10.00 10.25 10.50 10.75 11.00
| All that practice is paying off!

  |======================================================================           |  86%
| Plotting is an important statistical activity. So it should not come as a surprise that
| R has many plotting facilities. Type plot(rnorm(100), type="l", col="gold").
plot(rnorm(100), type="l", col="gold")

| Perseverance, that's the answer.

  |========================================================================         |  88%
| Hundred random numbers are plotted by connecting the points by lines in a gold color.

...

  |=========================================================================        |  91%
| Another very simple example is the classical statistical histogram plot, generated by
| the simple command hist. Make a histogram of 100 random numbers.
hist(rnorm(100))

| You got it right!

  |===========================================================================      |  93%
| The script that opens up is the same as the script you made before, but with more
| plotting commands. Type submit() on the command line to run it (you don't have to change
| anything yet).

plot_dataframe.R の内容

# Text behind the #-sign is not evaluated as code by R. 
# This is useful, because it allows you to add comments explaining what the script does.

# Make data frame
x1 = rnorm(100)
x2 = rnorm(100)
x3 = rnorm(100)
t = data.frame(a=x1, b=x1+x2, c=x1+x2+x3)

# Plot data frame
plot(t$a, type='l', ylim=range(t), lwd=3, col=rgb(1,0,0,0.3))
lines(t$b, type='s', lwd=2, col=rgb(0.3,0.4,0.3,0.9))
points(t$c, pch=20, cex=4, col=rgb(0,0,1,0.3))

# Note that with plot you get a new plot window while points and lines add to the previous plot.
> submit()

| Sourcing your script...

| Keep working like that and you'll get there!

  |=============================================================================    |  95%
| Try to find out by experimenting what the meaning is of rgb, the last argument of rgb,
| lwd, pch, cex. Type play() on the command line to experiment. Modify lines 11, 12 and 13
| of the script by putting your cursor there and pressing CTRL+ENTER. When you are
| finished, type nxt() and then ?par.

> ?par

様々な Option を確認してください。

| You got it right!

  |===============================================================================  |  98%
| You searched for par in the R help. This is a useful page to learn more about formatting
| plots. Google 'R color chart' for a pdf file with a wealth of color options.

...

  |=================================================================================| 100%
| To copy your plot to a document, go to the plots window, click the 'Export' button,
| choose the nicest width and height and click 'Copy' or 'Save'.

...

| You've reached the end of this lesson! Returning to the main menu...

これで、Module 2 の終了です。

6.4.3 Module 3

つづけて、Module 2 をはじめます。

| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Module 1
2: Module 2
3: Module 3

Selection: 3
  |                                                                                 |   0%

| Welcome to the third (and last) module of 'A (very) short introduction to R'.

...

  |==                                                                               |   2%
| Before you start working, you should set your working directory to where all your data
| and script files are or should be stored. Within RStudio you can go to 'Tools'> 'Set
| working directory'. Please do this now.

...

  |===                                                                              |   4%
| There are many ways to write data from within the R environment to files, and to read
| data from files. We will illustrate one way here. The following examples illustrate the
| essential.

...

  |=====                                                                            |   6%
| Construct a simple example data frame and store it in the variable d by typing d =
| data.frame(a = c(3,4,5), b = c(12,43,54)) .
d = data.frame(a = c(3,4,5), b = c(12,43,54))
| You got it!

  |=======                                                                          |   8%
| Ask R what d is.
d
| Nice work!

  |========                                                                         |  10%
| write this data frame to a text file called tst0.txt by typing write.table(d,
| file="tst0.txt", row.names=FALSE). The argument row.names=FALSE prevents that row names
| are written to the file. Because nothing is specified about col.names, the default
| option col.names=TRUE is chosen and column names are written to the file.

> write.table(d, file="tst0.txt", row.names=FALSE)

| You are amazing!

  |==========                                                                       |  12%
| The file is now stored in your working directory (the folder ending with swirl). Find it
| on your computer and open the file in an editor (such as Notepad) to see what it looks
| like.

...

  |============                                                                     |  15%
| Read the file into a data frame. Type d2 = read.table(file="tst0.txt", header=TRUE).

> d2 = read.table(file="tst0.txt", header=TRUE)

| Your dedication is inspiring!

  |==============                                                                   |  17%
| Note that the column names are also read (because you specified header=TRUE). The data
| frame also appears in the workspace window. Click on it to see what it looks like.

...

  |===============                                                                  |  19%
| Make a file called tst1.txt in Notepad from the example in the script that opens up and
| store it in your working directory. Modify the script such that it reads this file,
| multiplies the column called g by 5 and stores it as tst2.txt. Save the script and type
| submit().

tst2.txt の内容(text file を作成)

a  g  x 
1  2  3
2  4  6
4  8  12
8  16 24
16 32 48
32 64 96

**readfiles.R の内容

# Make a file called tst1.txt in Notepad with the data below (remove the #-signs) and store it in your working directory.

# a  g  x 
# 1  2  3
# 2  4  6
# 4  8  12
# 8  16 24
# 16 32 48
# 32 64 96


# Replace in this script the ... with the appropriate commands.


# Read the file tst1.txt.
d3 = read.table(file="tst1.txt", header = TRUE)


# Multiply the column called g by 5. 
d3$g = d3$g*5 


# Write the new table to file as tst2.txt.
write.table(d3, file="tst2.txt", row.names=FALSE)
> submit()

| Sourcing your script...


| You are amazing!

  |=================                                                                |  21%
| When you work with real data, you will encounter missing values because instrumentation
| failed or because you didn't want to measure in the weekend. When a data point is not
| available, you write NA instead of a number.

...

  |===================                                                              |  23%
| Type j = c(1,2,NA).
j = c(1,2,NA)
| You got it!

  |====================                                                             |  25%
| Compute the maximum of this vector j.
max(j)
## [1] NA
| Great job!

  |======================                                                           |  27%
| Computing statistics of incomplete data sets is strictly speaking not possible. Maybe
| the largest value occurred during the weekend when you did not measure. Therefore, R
| will say that it doesn't know what the largest value of j is.

...

  |========================                                                         |  29%
| If you don't mind about the missing data and want to compute the statistics anyway, you
| can add the argument na.rm=TRUE. (Should I remove the NAs? Yes!). Try this.
max(j, na.rm = TRUE)
## [1] 2
| Excellent work!

  |=========================                                                        |  31%
| The exercises you did before were nearly all with numbers. Sometimes you want to specify
| something which is not a number, for example the name of a measurement station or data
| file. In that case you want the variable to be a character string instead of a number.

...

  |===========================                                                      |  33%
| An object in R can have several so-called classes. The most important three are numeric,
| character and POSIX (date-time combinations). You can ask R what class a certain
| variable is by typing class(...).

...

  |=============================                                                    |  35%
| To tell R that something is a character string, you should type the text between
| apostrophes, otherwise R will start looking for a defined variable with the same name.

...

  |==============================                                                   |  38%
| Define this variable m = "apples" .
m = "apples"
| You got it right!

  |================================                                                 |  40%
| Look how the variable m is mentioned in the workspace window. Now try n = pears (so
| without the apostrophes). Look at the error message you get and then type n = "pears".

> n = pears
 エラー:  オブジェクト 'pears' がありません 
> n = "pears"

| You got it!

  |==================================                                               |  42%
| Of course, you cannot do computations with character strings. Try m+2 . Look at the
| error message you get and type class(m).

> m+2
 m + 2 でエラー:  二項演算子の引数が数値ではありません 
class(m)
## [1] "character"
| Great job!

  |===================================                                              |  44%
| Dates and times are complicated. R has to know that 3 o'clock comes after 2:59 and that
| February has 29 days in some years. The easiest way to tell R that something is a
| date-time combination is with the function strptime.

...

  |=====================================                                            |  46%
| Create a vector of dates by typing (or copy-pasting) date1 = strptime(
| c("20170225230000", "20170226000000", "20170226010000"), format="%Y%m%d%H%M%S") .
date1 = strptime(c("20170225230000", "20170226000000", "20170226010000"), format="%Y%m%d%H%M%S")
| All that hard work is paying off!

  |=======================================                                          |  48%
| The numbers in the vectors are between apostrophes because the function strptime needs
| character strings as input. The argument format specifies how the character string
| should be read. In this case the year is denoted first (%Y), then the month (%m), day
| (%d), hour (%H), minute (%M) and second (%S). You don't have to specify all of them, as
| long as the format corresponds to the character string.

...

  |========================================                                         |  50%
| Ask R what date1 is. Also see how date1 appears in the workspace window.
date1
## [1] "2017-02-25 23:00:00 JST" "2017-02-26 00:00:00 JST"
## [3] "2017-02-26 01:00:00 JST"
| All that practice is paying off!

  |==========================================                                       |  52%
| Make a graph with on the x-axis today, Sinterklaas 2017 and your next birthday and on
| the y-axis the number of presents you expect on each of these days. Type this in the
| script, save it and type submit().

plotdates.R の編集後の内容(個人によってことなります。)


# Replace in this script the ... with the appropriate commands.

# Make a vector called dates with the three dates:
# today, Sinterklaas (a Dutch holiday) and your next birthday.
dates = strptime(c("20200505","20201205", "20210401"), format="%Y%m%d")

# Make a vector called presents with the three expected number of presents.
presents = c(1, 3, 2)

# Plot the dates versus the number of presents.
plot(dates, presents)
> submit()

| Sourcing your script...

| All that practice is paying off!

  |============================================                                     |  54%
| When you are building a larger program than in the examples above or if you are using
| someone else's scripts, you may encounter some programming statements. In the second
| half of this module we describe a few tips and tricks.

...

  |==============================================                                   |  56%
| The if-statement is used when certain computations should only be done when a certain
| condition is met (and maybe something else should be done when the condition is not
| met).

...

  |===============================================                                  |  58%
| The script that opens up contains an if statement. Read it (you don't have to change
| anything) and type submit() to execute the script.

下のスクリプトを編集する必要はありません。

ifstatement.R

# define a variable called w
w = 3

if(w < 5){
  d=2
}else{
  d=10
}

# First a condition is specified: w should be less than 5.
# If the condition is met, R will execute what is between the first brackets.
# If the condition is not met, R will execute what is between the second brackets, after the else. 
# You can leave the else{...}-part out if you don't need it.
> submit()

| Sourcing your script...
| You are really on a roll!

  |=================================================                                |  60%
| Ask what d is.
d
## [1] 2
| You nailed it! Good job!

  |===================================================                              |  62%
| In this case, the condition is met and d has been assigned the value 2.

...

  |====================================================                             |  65%
| To get a subset of points in a vector for which a certain condition holds, you can use a
| shorter method. Make one vector with a = c(1,2,3,4) .
a = c(1,2,3,4)
| Excellent work!

  |======================================================                           |  67%
| Make another vector with b = c(5,6,7,8) .
b = c(5,6,7,8)
| All that hard work is paying off!

  |========================================================                         |  69%
| Now type f = a[b==5 | b==8]
f = a[b==5 | b==8]
| That's a job well done!

  |=========================================================                        |  71%
| And ask what f is.
f
## [1] 1 4
| Nice work!

  |===========================================================                      |  73%
| The vector f is composed of those elements of vector a for which b equals 5 or b equals
| 8.

...

  |=============================================================                    |  75%
| Note the double = in the condition. Other conditions (also called logical or Boolean
| operators) are <, >, != (not equal to), <= (less than or equal to) and >= (greater than
| or equal to).

...

  |==============================================================                   |  77%
| To test more than one condition in one if-statement, use & if both conditions have to be
| met ('and') and | if at least one of the conditions has to be met ('or').

...

  |================================================================                 |  79%
| If you want to model a time series, you usually do the computations for one time step
| and then for the next and the next, etc. Because nobody wants to type the same commands
| over and over again, these computations are automated in for-loops.

...

  |==================================================================               |  81%
| In a for-loop you specify what has to be done and how many times. To tell how many
| times, you specify a so-called counter.

...

  |====================================================================             |  83%
| The script that opens up contains a for-loop. Read it (you don't have to change
| anything) and type submit() to execute the script.

編集する必要はありません。

forloop.R の内容


# Define a vector called h.

h = seq(from=1, to=8)


# Create an empty vector called s. 
# This is necessary because when you introduce a variable within the for-loop, 
# R will not remember it when it has gotten out of the for-loop.

s = c()


# The next lines contain the for-loop.
# In this case, i is the counter and runs from 2 to 10.
# Everything between the curly brackets is processed 9 times. 
# The first time i=2, the second element of h is multiplied with 10 
# and placed in the second position of the vector s. 
# The second time i=3, etc. 
# In the last two runs, the 9th and 10th elements of h are requested, which do not exist. 
# Note that these statements are evaluated without any explicit error messages.

for(i in 2:10) 
{
  s[i] = h[i] * 10
}
> submit()

| Sourcing your script...


| Perseverance, that's the answer.

  |=====================================================================            |  85%
| Make a vector called z containing numbers from 1 to 100 in the script that opens up.
| Make a for-loop which runs through the whole vector. Within the for-loop, multiply the
| elements which are smaller than 5 or larger than 90 with 10 and the other elements with
| 0.1.

exercise_forloop.R の内容


# Replace the ... with the appropriate commands.

# Make vector called z containing numbers from 1 to 100.
z = seq(1,100)

# Make the for-loop with the range to run over in round brackets, 
# the things to do each run in curly brackets,
# and an if-statement between the curly brackets.

for(i in z)
{
  if(z[i] < 5 | z[i] > 90){
    z[i] = z[i] * 10
  }else{
    z[i] = z[i] * 0.1
  }
}
> submit()

| Sourcing your script...


| Your dedication is inspiring!

  |=======================================================================          |  88%
| You can also program functions yourself. These work in the same way as pre-programmed R
| functions.

...

  |=========================================================================        |  90%
| The script that opens up contains an example of a self-programmed function. Read it (you
| don't have to change anything) and type submit() to execute the script.

編集する必要はありません。

function.R の内容

fun1 = function(arg1, arg2)
{
  w = arg1 ^ 2
  return(arg2 + w)
}

# In line 1 the function name (fun1) and its arguments (arg1 and arg2) are defined.
# Lines 2-5 specify what the function should do if it is called. 
# The return value (arg2+w) is given as output.
> submit()

| Sourcing your script...


| All that hard work is paying off!

  |==========================================================================       |  92%
| You can see that the function appeared in your workspace window (you may have to scroll
| down). To call the function with arguments 3 and 5, type fun1(arg1 = 3, arg2 = 5) .
fun1(arg1 = 3, arg2 = 5)
## [1] 14
| Excellent work!

  |============================================================================     |  94%
| Try to understand the answer you get. When you call the function, R fills out the value
| 3 for each occurrence of arg1 and 5 for arg2.

...

  |==============================================================================   |  96%
| Write a function (called fun2) for the previous exercise, so that you can feed it any
| vector you like (as argument). Use a for-loop in the function to do the computation with
| each element. Use the standard R function length in the specification of the counter.

exercise_function.R わたしの入力(間違っています!)


# Put the for-loop you made before inside a function.
# Use the same structure as fun1, 
# so copy and modify the first and last lines of this script.
# Call your new function fun2, with one argument called z.


fun2 = function(z)
{
  for(i in 1:length(z))
  {
    if(z[i] < 5 | z[i] > 90){
      z[i] = z[i] * 10
    }else{
      z[i] = z[i] * 0.1
    }
  }
}
> submit()

| Sourcing your script...


| Not quite, but you're learning! Try again.

> skip()

| I just sourced the following script, which demonstrates one possible solution.

exercise_function_correct.R 正解

# Put the for-loop you made before inside a function.
# Use the same structure as fun1, 
# so copy and modify the first and last lines of this script.
# Call your new function fun2, with one argument called z.

fun2 = function(z)
{
  for(i in 1:length(z))
  {
    if(z[i] < 5 | z[i] > 90){
      z[i] = z[i] * 10
    }else{
      z[i] = z[i] * 0.1
    }
  }
  return(z)
}
Press Enter when you are ready to continue...

| Keep up the great work!

  |===============================================================================  |  98%
| Actually, people often use more for-loops than necessary. The previous exercise can be
| done more easily and quickly without a for-loop but with regular vector computations.

...

  |=================================================================================| 100%
| You finished 'A (very) short introduction to R'. You can use the pdf version of this
| short course as a reference. The last two pages of that document also contains a list of
| useful commands that you can use as inspiration for your programming.

...

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: A (very) short introduction to R
2: R Programming
3: Take me to the swirl course repository!

Selection: 0

| Leaving swirl now. Type swirl() to resume.

>